Argument for and against obsolete scientific method
In favor of Anderson:
Big data is replacing the traditional scientific methodology by creating a new perspective to present the world: observation instead of understanding. Instead of establishing a firm logic or proved model behind the observation, in most cases it is enough to only have observed data or a simple correlation. It will be redundant to take time and effort to follow the old path of scientific researching and creating a model which may be wrong and fail to reveal the truth. Big data is a representation of empiricism that directly tells you how things work by judging millions of correlated data and skipping the “understanding” or “proving part”. The Google case is very interesting because it denies the necessities to actually obtain knowledges and understand information. A better data operated by applied mathematics can reach the answer in a much more efficient and even correct way than establishing model to find knowledge or reason behind the data.
This scientific revolution can be seen as a leap in the progress of methodology because it makes explanation or conclusions escape from the limitation of model. By believing “correlation is enough”against using the old approach to science - hypothesize, model, test, we can prevent to stuck in the falseness of models as having data to reveal the reality instead of underlying reality. The case of physics and biology mentioned well support this point that the correlation supported by big data is far more useful and even correct than the models supported by underlying hypothesis. The trend of correlation superseding causation and abandonment of coherent models is an extreme embodiment of empiricism and will be unstoppable in this world where emphasizing practical results.
In favor of kitchin:
Established scientific researching methodology will experience an essential shift under the effect of the Big Data and new data analytics. They allow us to explore new forms of empiricism where correlation and data information are enough instead of further causation and models established. Nonetheless, traditional scientific theories will not be obsolete and completely substituted by statistical tools and data exploration. It is important and necessary to realize the indispensability of theoretical science and the immaturity of this new exploratory science stage.
It should be pointed out that using Big Data is just another tool to perform research and it will be shaped by technology and algorithms used and the data and environment employed. So these data actually provide oligoptic points of views and therefore, it is not a totally objective method to carry out scientific researches, which is misunderstood by many people. If we over rely on statistical data and believe only correlation instead of causations and further strict rule to be conducted, we will fail to recognize the bias and mistakes of these oligoptic views. In addition, the information of data does not appear and exist solely from nowhere. Big data will not automatically discover insights but based on scientific reasoning or hypothesis. From this perspective, intensive data mining is more likely an assistance to old theoretical scientific methodology. It is also essential to notice that data without appropriate interpretation is only a series of numbers or texts that will not speak for themselves and have little meanings. Knowledge and causation are still important for interpreting data reasonably. Maybe it will be less often to seek “why” in the future as we think Big Data directly present the information; but without these contexts and domain-specific knowledge, a misleading data interpretation will exist and therefore the process to seek for “why” should not be ignored as it will improve the accuracy and correctness of data exploration.
Big data will not totally replace the traditional scientific method. Instead, to some degrees they are complementary and can be jointly used for greater scientific development. Ideally, data and statistics exploration under the rule of theoretical science will have a better performance than conducting each of them separately.