Using sociological data from social media to predict the future

October 11th, 2011

From the TImes:

Now social scientists are trying to mine the vast resources of the Internet — Web searches and Twitter messages, Facebook and blog posts, the digital location trails generated by billions of cellphones — to do the same thing.

The most optimistic researchers believe that these storehouses of “big data” will for the first time reveal sociological laws of human behavior — enabling them to predict political crises, revolutions and other forms of social and economic instability, just as physicists and chemists can predict natural phenomena.

“This is a significant step forward,” said Thomas Malone, the director of the Center for Collective Intelligence at the Massachusetts Institute of Technology. “We have vastly more detailed and richer kinds of data available as well as predictive algorithms to use, and that makes possible a kind of prediction that would have never been possible before.”

The government is showing interest in the idea. This summer a little-known intelligence agency began seeking ideas from academic social scientists and corporations for ways to automatically scan the Internet in 21 Latin American countries for “big data,” according to a research proposal being circulated by the agency. The three-year experiment, to begin in April, is being financed by the Intelligence Advanced Research Projects Activity, or Iarpa (pronounced eye-AR-puh), part of the office of the director of national intelligence.

The automated data collection system is to focus on patterns of communication, consumption and movement of populations. It will use publicly accessible data, including Web search queries, blog entries, Internet traffic flow, financial market indicators, traffic webcams and changes in Wikipedia entries.

It is intended to be an entirely automated system, a “data eye in the sky” without human intervention, according to the program proposal. The research would not be limited to political and economic events, but would also explore the ability to predict pandemics and other types of widespread contagion, something that has been pursued independently by civilian researchers and by companies like Google.

And some instances of how this technology is being used:

So far there have been only scattered examples of the potential of mining social media. Last year HP Labs researchers used Twitter data to accurately predict box office revenues of Hollywood movies. In August, the National Science Foundation approved funds for research in using social media like Twitter and Facebook to assess earthquake damage in real time.

The accessibility and computerization of huge databases has already begun to spur the development of new statistical techniques and new software to manage data sets with trillions of entries or more.

“Big data allows one to move beyond inference and statistical significance and move toward meaningful and accurate analyses,” said Norman Nie, a political scientist who was a pioneering developer of statistical tools for social scientists and who recently formed a new company, Revolution Analytics, to develop software for the analysis of immense data sets.