“Big data is the latest casualty of overcooked promises made in pursuit of a good story.” This is the claim made by Nick Heath in Big Data: neither snake oil or silver bullet in Tech Republic looking at the hype surrounding Google’s Flu Trends. The story of Google’s tool “Google Flu Trends” which was operating between 2010 and 2015 may tell us that the glorification of big data isn’t always legitimate. 

The tool was launched in 2009 in order to predict influenza waves using data collected by the Google in 25 countries. The idea behind is very simple, every time a person searches for symptoms or treatments related to influenza on the Google search-engine, the algorithm decides whether or not to track the person and collect its data. With the collected information, Google should have been able to predict upcoming influenza waves even before institutions like the World Health Organization (WHO) or the Centers for Disease Control and Prevention (CPC) in the United States. 

However, a team of researchers of the University of Washington claims that the tool was 25 % less predictive than the CPCs. Doctor Justin Ortiz, M.D., clinical fellow at the University of Washington stated that “because Google Flu Trends estimates of influenza-like illness may not necessarily correlate with actual influenza virus infections, we undertook this study to evaluate the validity of Google Flu Trends influenza surveillance by comparing it to a gold standard of CDC’s national surveillance for influenza laboratory tests positive.” The outcome was that the detection of influenza is much more complex than the coders of the Google Flu Trends may have thought. 

One major error in the programming of the algorithm was surely that it did not take account of the lack of medical expertise of most people. So often when people go to see a doctor thinking that they are affected by influenza, it turns out that the symptoms are related to other diseases. Consequently, it is a big error to build an algorithm on search-queries of people who are likely to mis-diagnose. 

There is a growing realization that big data analytics is still plagued by the same sample errors, biases, and methodological hurdles of traditional information gathering and works best when it is “ground-truthed” with these networks.

The assumption that big data is a substitute, rather than a supplement to traditional data analysis is called “big data hubris”. The above-mentioned error consists in the fact that quantity of data does not mean that one can ignore foundational issues of measurement and construct validity and reliability and dependencies among data. As a result, the output can’t be fitting when the input is distorted.  

In the end, Google published the data collected under this failed project until 2015 to help the CPCs to build their own more performant algorithms. It remains to be seen if future algorithms will take the lesson of Google’s Flu Trends flop. Furthermore, with the consistent development of artificial intelligence and deep learning, future algorithms could be way more efficient by feeding them with medical documentation for example. 

Tun Hirt
Master 2 Cyberjustice -Promotion 2018-2019

Sources :
Big Data : Lessons from Google Flu, Al Gore Rhythm March 10 2017
Google Flu Trends : A case of Big Data gone bad? SiliconAngle March 24 2014
Google Flu Trends Estimates Off, News Wise May 10 2010
D. Lazer et al., Science 323, 721 (2009)
D. Boyd, K. Crawford, Inform. Commun. Soc. 15, 662 (2012)

A propos de