[lbo-talk] Science

Wojtek Sokolowski sokol at jhu.edu
Fri Oct 1 13:38:55 PDT 2004


James Greenstein:
> Second, Wojtek continues with a discussion of statistics in science. It
seems to me that his
> discussion is one sided, in saying that any element of uncertainty is held
up by pundits to
> show that there is no certainty. All to often the reverse happens - very
uncertain data are
> used to 'prove' a real problem, especially in health.
>
> I don't want to through the baby out with the bathwater, but while a
statistical correlation
> may be a good indication of causation, and should certainly prompt further
investigation, it
> is often a very provisional form of evidence.

Whilst your comments about correlation and causation are obviously true, serious scientific research rarely equates the two. Instead, they formulate _a priori_ hypotheses that specify the causal relation i.e. the ordering of variables (independent, intervening and dependent variables) the direction of causation and the magnitude of the effect (if possible) and then test that against the data. The statistical significance test makes sense only in that context it essentially tells you that there is 5 or less percent probability that empirically obtained results are consistent with those specified a priori by chance. Stated differently, scientists do not "dredge" the data for correlations (or at least not admit that), but instead develop theoretical models and then test them by looking for correlations.

However, the probabilistic nature of science is not necessarily related to 'statistics' as it is commonly understood (i.e. using probability samples) but two much more profound factors, namely measurement error and incompleteness of the model. The measurement error is simply the fact that we almost never measure any quantity with 100% accuracy - all measurements are approximate to some degree. That may have little effect if you calculate bi-variate correlations, but can entirely change you results when you deal with complex or non-linear (i.e. using nonlinear equations) models. In such situations even a small change in the value on one variable can produce disproportionally large changes in the calculated results (the so called butterfly effect, a butterfly flapping its wings in one part of the world may trigger a chain of events leading to a land slide in another part). If only a small change on the variable is required to produce such an effect, it is difficult to tell whether that small change was "real" or merely an artifact of measurement error.

The incompletness of the model (aka Goedel incompletness theorem) refers to the fact that with more complex model it is not possible to tell whether certain propositions are or are not consistent with that model. Stated differently it may not possible to tell which of certain two mutually contradictory propositions is consistent or inconsistent with a particular model.

In another sense, incompletness refers to the fact that not all relevant variables are in the model. Assume that you want to explain variation on a variable Y. You do so by correlating it with the theoretically relevant variables X1, X2 ... Xn. Suppose that you have a model with three independent variables - all of them turn to be statistically significant (which rules out sampling error) and running in the predicted direction (which rules out spurious correlations, at least provisionally). These three variables explain, say, 30% of the variance on Y - which leaves 70% unexplained. That unexplained variance may be due to purely random factors or due to factors not included in the model, but we do not know which until we actually introduce the relevant variable representing these factors to the equation. Doing so, however, may change the previously effects of the three original variables and undo the previously claimed relationship. So we thus always face the dilemma of providing partial explanations which may prove spurious when more factors are added, or look for a complete explanation which may be a wild goose chase anyway according to the Goedel theorem.

So the bottom line is that science is always probabilistic or provisional and scientists usually have no problem with that. The problem starts when quacks, charlatans and spin doctors take that to claim that nothing is known with certainty and therefore their flat earth science is as good as any other type of science.

Wojtek



More information about the lbo-talk mailing list