Big data analytics has been touted in the media as the revolutionary technology of the 21st Century – but is it really that big?
According to enraptured journalists and the vendors of big data solutions, a new age is awaiting us, an age where everything is known, analysed and acted upon, a world where big data knows us better than we know ourselves.
Journalist Michael Coren described the phenomenon in typically gushing terms:
“Every century, a new technology – steam power, electricity, atomic energy, or microprocessors – has swept away the old world with the vision for a new one. Today, we seem to be entering the era of Big Data.”
Back down on Earth, the reality is looking a little different. As major firms ramp up huge investments in big data, hoping to capture insight from new social and mobile data sets, and organisations scramble to employ data scientists, a number of notable failures have brought the big data project into question.
Meanwhile, crowdsourcing ventures have gone from strength to strength in fields as wide as science, medicine and rescue missions, and have had remarkable successes in areas where big data has failed to make any significant progress.
We at ITProPortal take a look at the most recent notable failures of big data, and compare that record to the successes of crowdsourcing.
Which one does the future belong to – big data or crowdsourcing?
Google flu tracker failure
About a year ago, the media was gushing about the predictive capabilities of Google Flu Trends (GFT), the search giant’s algorithm-powered flu tracking tool. Google’s global flu map was awash in red last January and GFT even appeared to be several steps ahead of the Centers for Disease Control and Prevention (CDC) in calculating the scope and severity of the 2012-13 flu season.
Fast-forward to March 2014, when it turns out GFT has been routinely over-estimating the size of the influenza pandemic by as much as double its actual prevalence. In fact, the GFT badly miscalculated the severity of last year’s flu season, predicting double the amount of flu-related doctor visits that wound up being reported to the CDC.
Google’s flu-tracking system was also off in the 2011-12 season, when it “overshot the actual level [of flu prevalence] by more than 50 percent,” while GFT “completely missed the non-seasonal 2009 influenza A–H1N1 pandemic,” according to researchers from Harvard and Northeastern University.
This embarrassing oversight by one of big data’s flagship projects was exposed in a paper by researchers from Harvard University, who described the phenomenon as “big data hubris.”
Big data hubris
According to the researchers, “big data hubris” is the “often implicit assumption that big data are a substitute for, rather than a supplement to, traditional data collection and analysis.”
“We have asserted that there are enormous scientific possibilities in big data,” the team conceded. “However, quantity of data does not mean that one can ignore foundational issues of measurement, construct validity and reliability, and dependencies among data.”
The team argued that in order for big data analytics to be effective, it actually required a large amount of human input to tweak and coax the data into “telling a story”. This often led to a vastly unscientific approach to data, and undermined its claims to being an evidence-based system of analytics.
This is the big problem with GFT and big data systems like it: administrators often alter inputs in an ad-hoc, unscientific fashion to better fit their predictive models.
So while the GFT has attempted to decipher tens of millions of Google searches to puzzle out search terms that correlate strongly with a rise or fall in flu cases, “[t]he odds of finding search terms that match the propensity of the flu but are structurally unrelated, and so do not predict the future, were quite high,” the researchers said.
Businesses are increasingly finding it difficult to wade through the hyper surrounding big data and get to the actual facts underneath, some of which are not flattering to the burgeoning new science.
Elsewhere in the world of medicine, scientists are running into other problems with the much-touted power of big data in diagnosis and treatment. Late last year, Douglas Johnston, a surgeon with the Cleveland Clinic, told an MIT-chaired panel that the value of big data analytics was severely limited:
“There are certain conditions, where there are very well valued predictive models,” he said. Unfortunately, “The per cent of patients whose conditions fit into those models is probably 10 per cent.”
There are big blind spots, too, that can throw off analytical models. Johnston argued that if big data is to be useful at all, scientists have to “develop predicative models that handle patients with more than one condition. That’s one of the most difficult things for us to measure,” he added.
The Harvard team concluded that “Instead of focusing on a “big data revolution,” perhaps it is time we were focused on an ‘all data revolution,’ where we recognise that the critical change in the world has been innovative analytics, using data from all traditional and new sources, and providing a deeper, clearer understanding of our world.”
So big data is, in generous terms, still in its teething stages.
Shooting for the moon with crowdsourcing
Meanwhile, successes in crowd-sourced systems have shown the strength of the model, and the reliability of results produced.
The University of Colorado Boulder recently carried out a study into the efficiency of crowdsourcing when it comes to mapping craters on the moon. In it, the work of eight NASA-employed pros was compared against a group of thousands of amateurs who counted craters via the science-themed, crowd-source gathering place “CosmoQuest”.
The two teams set about counting the number of craters larger than about 35ft in diameter.
The result? When both groups’ results were averaged together, they ended up being statistically the same.
Now, if CosmoQuest were to turn crater-counting into a game of sorts with some kind of real-life reward, like vouchers, gift cards or even cash, wouldn’t you be up for a bit of mind-numbing astronomy?
“Our view now is to let the scientists focus on the science,” one of the researchers said, “and willing volunteers can do crowdsourcing work by marking craters – even if they do it at night while watching television.”
The teams have had remarkable success so far.
Another example is the new intergalactic gaming app launched by Cancer Research UK, which allows smartphone users to contribute to scientist’s understanding of cancer by playing a somewhat addictive space-themed arcade game.
The game, Play to Cure, asks users to map a route through the densest areas of the valuable Element Alpha, which can be sold and used to buy ship upgrades.
The catch is that the players are in fact analysing vast amounts of genetic cancer data, work which normally takes trained scientists huge amounts of man-hours to accomplish.
“Future cancer patients will be treated in a more targeted way based on their tumour’s genetic fingerprint and our team is working hard to understand why some drugs work and others won’t,” said Professor Carlos Caldas from Cancer Research UK.
“But no device can do this reliably and it would take a long time to do the job manually. Play to Cure: Genes in Space will help us find ways to diagnose and treat cancer more precisely – sooner.”
Not only that, but the outpouring of public interest in the disappearance of Malaysia Airlines flight MH370 sent millions of people to crowdsourcing websites, poring over thousands of square miles of high-definition satellite images looking for wreckage. The interest in participation was so great that popular crowdsourcing sites crashed under the load of traffic, and resulted in the world’s largest ever crowdsourcing project.
These are tasks too complex for algorithms to parse and analyse. It seems that for the most crucial analysis, crowdsourcing is the go-to solution.
Another example is the US city of Boston. The Boston city council was one of the first in the world to discover the power of crowdsourcing via the citizen-carried sensor – the humble smartphone.
When they released a fully-integrated smartphone app for the good people of Boston to report issues such as potholes and graffiti when they found them, the city planners had no idea that some civic-minded Bostonians would actually go out for walks with the specific aim of looking for problems just to report them over the app.
This has become something of a minor hobby in the Walking City, and its success has meant that the city authorities find and fix more problems than ever before. This means the authorities can save money by intervening in problems before they got worse, and avoid litigation from injury lawsuits and other liability problems.
It’s been so successful, the UK government began piloting a similar scheme to report potholes back in January.
In Australia, researchers have had similar success with an app for mapping noise pollution across major cities, using citizen input to build a map of the city’s noise hotspots.
Not only is NASA crowdsourcing its latest generation of software development, but Microsoft has also released a crowdsourcing solution for politicians called TownHall, aimed at letting candidates for public office build Web sites that can foster community discussion about issues and campaign topics.
Comparing crowdsourcing directly with big data is a bit like comparing apples and orange-scented car air fresheners, of course.
However, as the problems they’re employed to solve increasingly begin to overlap, where does the future lie? For us, crowdsourcing looks like the more promising candidate, despite the hype surrounding big data.
Are you a big data evangelist? Are you a nut for crowdsourcing? Let us know your thoughts in the comments section below.Leave a comment on this article