Saturday, November 20, 2010

Why So Many (Medical) Studies Based On Statistics Are Wrong

Without peering into the mathematical guts, here is how statistical studies actually work:
  1. Data are gathered in the hopes of proving a cherished hypothesis.
  2. A statistical model is selected from a toolbox which contains an enormous number of models, yet it is usually the hammer, or “regression”, that is invariably pulled out.
  3. The model is then fit to the data. That is, the model has various drawstrings and cinches that can be used to tighten itself around the data, in much the same way a bathing suit is made to form-fit around a Victoria’s Secret model.
  4. And to continue the swimsuit modeling analogy, the closer this data can be made to fit, the more beautiful the results are said to be. That is, the closer the data can be made to fit to the statistical model, the more confident that a researcher is that his cherished hypothesis is right.
  5. If the fit of the data (swimsuit) on the model is eye popping enough, the results are published in a journal, which is mailed to subscribers in a brown paper wrapper. In certain cases, press releases are disseminated showing the model’s beauty to the world.

Despite the facetiousness, this is it: statistics really does work this way, from start to finish. What matters most, is the fit of the data to the model. That fit really is taken as evidence that the hypothesis is true.

But this is silly. At some point in their careers, all statisticians learn the mathematical “secret” that any set of data can be made to fit some model perfectly. Our toolbox contains more than enough candidate models, and one can always be found that fits to the desired, publishable tightness.

And still this wouldn’t be wrong, except that after the fit is made, the statistician and researcher stop. They should not!

The rest here.

No comments: