Twitterology: Using Twitter to Crowdsource Real-Time Linguistic Study

October 31st, 2011

Twitter is a gold mine to linguists and those who study pop-culture to see how people describe things in real time.

Twitter is many things to many people, but lately it has been a gold mine for scholars in fields like linguistics, sociology and psychology who are looking for real-time language data to analyze.

Twitter’s appeal to researchers is its immediacy — and its immensity. Instead of relying on questionnaires and other laborious and time-consuming methods of data collection, social scientists can simply take advantage of Twitter’s stream to eavesdrop on a virtually limitless array of language in action.

At the University of Texas, for example, a group of linguists and social psychologists has been monitoring Twitter to track on-the-ground sentiment over the course of the Arab Spring, particularly in Egypt and Libya. After the death of Colonel Qaddafi, the linguist David Beaver and his assistants quickly summoned thousands of Arabic-language tweets before and after the event. They zeroed in on messages known to be from Libya by using Twitter’s system of geocoding. (Posts from cellphones, for instance, very often encode the user’s geographic coordinates.) The tweets were then automatically translated from Arabic to English and fed into a text-analysis computer program.

Twitterology, I love it!

In this burgeoning field of Twitterology, moods are also being gauged on a more global level. Two sociologists at Cornell University, Scott A. Golder and Michael W. Macy, recently published a study in the journal Science that looked at how emotions may relate to the rhythms of daily life, across many English-speaking countries. They observed a gradual falloff in positive terms from the beginning of the workday, bottoming out in the late afternoon.

One criticism of “sentiment analysis,” as such research is known, is that it takes a naïve view of emotional states, assuming that personal moods can simply be divined from word selection. This might seem particularly perilous on a medium like Twitter, where sarcasm and other playful uses of language often subvert the surface meaning.

James W. Pennebaker, a social psychologist at the University of Texas who pioneered the text-analysis program often used in this kind of research, warns that positive and negative emotion words are the “low-hanging fruit” in such studies, and that deeper linguistic analysis should be explored to provide a “richer, more nuanced view” of how people present themselves to the world.