Sunday, 26 October 2014

Cat among the pigeons: the twelve maxims of data analysis

The ongoing surge in the number of publications does not represent a corresponding increase in scientific productivity. In part, this is because many recently published studies do not observe the following dicta.

1.    No analytical study should be published for a general readership—scientific or otherwise—unless it contributes something genuinely new to knowledge about the matter in question. It is the responsibility of the author of the study to ascertain this by gaining a thorough knowledge of the pre-existing literature, including works which were not published in the recent past.

2.    No pattern in data should be presented as the result of an analysis without investigation of its meaning and significance.

3.    No analytical technique should be employed to analyse a data set unless a prior reason exists to use it.

4.    No sample should be taken unless the population from which it comes is well defined and the representativeness of the sample can be established.

5.    The smallest number of variables and the smallest data sets should be used compatible with achieving a reliable outcome to the analysis.

6.    Inductive methodology should be used as little as possible, and only when there are not enough indications to formulate a deductive hypothesis.

7.    Correlation should be abandoned unless causality can be established by independent means.

8    No index should ever be created unless it has a clear meaning independently of the numbers it contains.

9.    Ranks should never be assigned to phenomena with multiple meanings or significances.

10.    Data for which numbers are assigned (e.g. by expert opinion) should be used separately from data in which the numbers are derived by measurement.

11.    Principal components and factor analysis should be banned; they have no inherent meaning and do not acquire it in producing results.

12.    'Data' is a plural word.