Monday, May 15, 2006

Stability of Measures

I was looking for an online IQ test that was free, and I could use as an example in a class - I came across one at www.iqtest.com. I had to make sure that it was free, so I did the test, and made sure I got a result - and it duly arrived as an email.

However, I remembered that, sometime ago, I'd done another online IQ test, and I wondered if it was the same one. Because I'm obsessive, I keep all my email, so I trawled through my archives, and lo and behold, there, in January, 2002 (a little over 4 years previously) was my result of the test. Now, I'm not going to tell you the scores (because that would bring shame on me or be boasting, depending), but my score the second time was 14 points lower. Fourteen points! That's very nearly a standard deviation.

Either I've got a lot dimmer (it's possible - January 2002 was several months before my wife gave birth to twin boys) or there's a bit of a stability issue with that test.

A sample of one doesn't tell us a great deal, but a bit more research would be interesting.

Saturday, May 13, 2006

Hypotheses, and tails

I'm a big fan of Bad Science - both as a web page and a column in The Guardian . This week's article is about evaluating a series of studies - in particular it talks about multiple testing and about one tailed tests. In its description of one tailed statistical tests, it doesn't go far enough though. If you carry out a one tailed test, you are saying that an effect in the opposite direction is meaningless and uninteresting - no matter how large it is, or what the p-value. If I think that drug X will make you better, then I might be tempted to carry out a one tailed test (after all, it gives me more power). However, this means that if drug X makes you worse, and it doesn't matter how much worse, it can even kill you, the test was one tailed, and the null hypothesis therefore cannot be rejected.

I have a hard time believing that there are many researchers who, on many occasions, would do this. An interesting result is, after all, an interesting result. Using a one tailed test just looks like cheating, because you couldnt' get a significant result by using a two tailed test. Bland and Altman discuss this in a BMJ article (not Bland and Bland, as it says in the HTML). They write "In general a one sided test is appropriate when a large difference in one direction would lead to the same action as no difference at all." Martin Bland has told me that in his 30 (or so) years as a practising medical statistician, with somewhere over 300 papers to his name, he has used a one tailed test once.

That paper is here. The question asked was whether heart transplant was associated with an increased risk of death in a cohort study. If it is associated with a decrease, or with no change, what are we going to do? More heart transplants? We can't do more heart transplants, we do as many as we can. In this case, the rule that an effect in the opposite direction would have the same consequences as no effect is satisfied. But it's prtty rare that that is the case.

Monday, May 08, 2006

Bad Questions

Here's a nice example of a bad question for a survey:

Q: Do You Agree With Moussaoui Getting Life In Prison For His Role In 9/11?

If you say no, do you mean:
No, he should have been sentenced to death.
Or
No, he was a schizophrenic fantasist who was innocent.