2017-04-07

Data science is a new norm of science? Not that new

When I first read Anderson's "The End of Theory" in 2015, it had been seven years since the article was published. To me, it did make some sense by declaring that theories, or even the whole system of science, which is composed of axioms and theorems, may not be so necessary if we can just interpret everything by what data, or data mining, tells us.

To me it was exactly in line with the idea of Mendeleev's periodic table: when we are unsure of what rules are behind the phenomena, something based on empirical observations (or loads of experiment results plus literature) may provide a massive and inclusive table like a dictionary, where we can just look up the answer to whatever question we ask. In the big data era, it is just more feasible to feed the work towards this table with the rich quantity and quality of data. Well, when we really have got this table, why do we still want theories? We don't. This lookup table is more than enough for us.

But at that moment, I was going mad seeking after theories in life science, which I believed would be a key element to make real breakthroughs in lots of areas, from cancer research to induced stem cells. The idea of a lookup table looked not a nice thing to me. At least it was a bit ahead of time.

Interestingly, I had just read another article, "On the Tendencies of Motion", an ironical fake research paper written by a group of real and serious scientists. This is an article wrote in 1981, about 27 years earlier that the last one. To some extent, this article did better than Anderson did in explaining how Data Science is carried out. In a really clumsy manner, using dirty, heterogeneous and inconsistent data, and a really slow "computer" composed of "brothers of the monastic orders, each working an abacus and linked in the appropriate parallel and serial circuits by the abbots", this fake research project nearly re-discovered Newton's 2nd and 3rd laws, approximately.

The fake and clumsy research mocked in that article, is now actually becoming the norm of data science, thanks to the improvement in computational hardware and software, so called big data infrastructures. Why do we give up effort in seeking after theories, causation, rules, or the "truth"? Because we can. The power of enhanced data processing and computation capabilities has made all this possible. That's an argument which makes some sense.

没有评论:

发表评论