Logo and Side Nav
News
Velit dreamcatcher cardigan anim, kitsch Godard occupy art party PBR. Ex cornhole mustache cliche. Anim proident accusamus tofu. Helvetica cillum labore quis magna, try-hard chia literally street art kale chips aliquip American Apparel.
Search
Browse News Archive
- March 2014
- February 2014
- January 2014
- December 2013
- November 2013
- October 2013
- September 2013
- August 2013
- July 2013
- June 2013
- May 2013
- April 2013
- March 2013
- February 2013
- January 2013
- December 2012
- November 2012
- October 2012
- September 2012
- August 2012
- July 2012
- June 2012
- May 2012
- April 2012
- March 2012
- February 2012
- January 2012
- December 2011
- November 2011
- October 2011
- September 2011
- August 2011
- July 2011
- June 2011
- May 2011
- April 2011
- March 2011
- February 2011
- December 2010
- November 2010
- October 2010
- August 2010
- July 2010
- June 2010
- May 2010
- April 2010
- March 2010
- February 2010
- January 2010
- December 2009
- November 2009
- October 2009
- September 2009
- August 2009
- July 2009
- June 2009
- May 2009
- April 2009
- March 2009
- February 2009
- January 2009
- December 2008
- November 2008
- October 2008
- September 2008
- August 2008
- July 2008
- June 2008
- May 2008
- April 2008
- March 2008
- February 2008
- January 2008
- December 2007
- November 2007
- October 2007
- September 2007
- June 2007
- May 2007
Thursday, March 31, 2011
The Promises and the Challenges of Big Social Data
text author: Lev Manovich
article version: 1
posted March 31, 2011
[This is the first part of a longer article – the second part will be posted in the next few days]
The emergence of social media in the middle of 2000s created a radically new opportunity to study social and cultural processes and dynamics. For the first time, we can follow imagination, opinions, ideas, and feelings of hundreds of millions of people. We can see the images and the videos they create and comment on, eyes drop on the conversations they are engaged in, read their blog posts and tweets, navigate their maps, listen to their tracklists, and follow their trajectories in physical space.
In the 20th century, the study of the social and the cultural relied on two types of data: “surface data” about many (sociology, economics, political science) and “deep data” about a few (psychology, psychoanalysis, anthropology, ethnography, art history; methods such as “thick description” and “close reading”). For example, a sociologist worked with census data that covered most of the country’s citizen; however, this data was collected only every 10 year and it represented each individual only on a “macro” level, living out her/his opinions, feelings, tastes, moods, and motivations. In contrast, a psychologist was engaged with a single patient for years, tracking and interpreting exactly the kind of data which census did not capture.
In the middle between these two methodologies of “surface data” and “deep data” was statistics and the concept of sampling. By carefully choosing her sample, a researcher could expand certain types of data about the few into the knowledge about the many. For example, starting in 1950s, Nielsen Company collected TV viewing data in a sample of American homes (via diaries and special devices connected to TV sets in 25,000 homes), and then used this sample data to tell TV networks their ratings for the whole country for a particular show (i.e. percentage of the population which watched this show). But the use of samples to learn about larger populations had many limitations.
For instance, in the example of Nelson’s TV ratings, the small sample used did not tell us anything about the actual hour by hour, day to day patterns of TV viewing of every individual or every family outside of this sample. Maybe certain people watched only news the whole day; others only tuned in to concerts; others had TV on not never paid attention to it; still others happen to prefer the shows which got very low ratings by the sample group; and so on. The sample data could not tell any of this. It was also possible that a particular TV program would get zero shares because nobody in the sample audience happened to watch it – and in fact, this happened more than once.
Think of what happens then you take a low-res image and make it many times bigger. For example, lets say you stat with 10x10 pixel image (100 pixels in total) and resize it to 1000x1000 (one million pixels in total). You don’t get any new details – only larger pixels. This is exactly happens when you use a small sample to predict the behavior of a much larger population. A “pixel” which originally represented one person comes to represent 1000 people who all assumed to behave in exactly the same way.
The rise of social media along with the progress in computational tools that can process massive amounts of data makes possible a fundamentally new approach for the study of human beings and society. We no longer have to choose between data size and data depth. We can study exact trajectories formed by billions of cultural expressions, experiences, texts, and links. The detailed knowledge and insights that before can only be reached about a few can now be reached about many – very, very many.
In 2007, Bruno Latour summarized these developments as follows: “The precise forces that mould our subjectivities and the precise characters that furnish our imaginations are all open to inquiries by the social sciences. It is as if the inner workings of private worlds have been pried open because their inputs and outputs have become thoroughly traceable.” (Bruno Latour, “Beware, your imagination leaves digital traces”, Times Higher Education Literary Supplement, April 6, 2007.)
Two years earlier, in 2005, Nathan Eagle at MIT Media Lab already was thinking along the similar lines. He and Alex Pentland put up a web site “reality mining” (reality.media.mit.edu) and wrote how the new possibilities of capturing details of peoples’ daily behavior and communication via mobile phones can create “Sociology in the 21st century.” To put this idea into practice, they distributed Nokia phones with special software to 100 MIT students who then used these phones for 9 months – which generated approximately 60 years of “continuous data on daily human behavior”.
Finally, think of Google search. Google’s algorithms analyze text on all web pages they can find, plus “PDF, Word documents, Excel spreadsheets, Flash SWF, plain text files, and so on,” and, since 2009, Facebook and Twitter content. (en.wikipedia.org/wiki/Google_Search). Currently Google does not offer any product that would allow a user to analyze patterns in all this data the way Google Trends does with search queries and Google’s Ngram Viewer does with digitized books – but it is certainly technologically conceivable. Imagine being able to study the collective intellectual space of the whole planet, seeing how ideas emerge and diffuse, burst and die, how they get linked together, and so on – across the data set estimated to contain at least 14.55 billion pages (as of March 31, 2011; see worldwidewebsize.com).
Does all this sounds exiting? It certainly does. What maybe wrong with these arguments? Quite a few things.
[The second part of the article will be posted here within next few days.]
-----------------
I am grateful to UCSD faculty member James Fowler for an inspiring conversation a few years about the depth/surface questions. See his pioneering social science research at jhfowler.ucsd.edu.