Saturday, February 09, 2013

Beware the Big Errors of 'Big Data' | Wired Opinion |

Excerpt from a Nassim Taleb big data reality check
"The problem with big data, in fact, is not unlike the problem with observational studies in medical research. In observational studies, statistical relationships are examined on the researcher’s computer. In double-blind cohort experiments, however, information is extracted in a way that mimics real life. The former produces all manner of results that tend to be spurious (as last computed by John Ioannidis) more than eight times out of 10.
Yet these observational studies get reported in the media and in some scientific journals. (Thankfully, they’re not accepted by the Food and Drug Administration). Stan Young, an activist against spurious statistics, and I found a genetics-based study claiming significance from statistical data even in the reputable New England Journal of Medicine — where the results, according to us, were no better than random. [...]
I am not saying here that there is no information in big data. There is plenty of information. The problem — the central issue — is that the needle comes in an increasingly larger haystack."
