Types of Errors
Andrew Gelman's blog had a post today which linked back to an old post where he discussed the different kinds of errors. In our forthcoming book, we discuss the fact that all null hypotheses are false. If that's the case (and it is) you can never make a type I error. Because you make a type I error when you reject a null hypothesis, which is true. And if you know that all null hypotheses are false, you can't make a type II error, when you say that the null hypothesis isn't false, when it is. Because you know it isn't false.
So what do we do? We can't say that we never make an error, so instead, we need knew kinds of errors, which Gelman calls Type M and Type S.
A type M error occurs when you get the magnitude of an effect wrong. If we test for a correlation between two measures, and we find that the correlation is not significant, we (should) say, 'Well, whatever the effect is, it's small (and we don't know what direction it's in).' If the correlation is actually large, we've made a type M error.
A type S correlation occurs when we get the sign of an effect wrong. Let's say we find a significant positive correlation, and conclude that the population correlation is positive. If the population correlation is actually negative, we've made a type S error.
Type M and Type S errors make a lot more sense than Type I and II errors (which, as we've seen here, don't make sense). And they're a lot easier to remember. Gelman then goes into a lot of Bayesian elaboration, which I don't want to go into. I can be a Bayesian when I need to, but I've really got to need to.
So what do we do? We can't say that we never make an error, so instead, we need knew kinds of errors, which Gelman calls Type M and Type S.
A type M error occurs when you get the magnitude of an effect wrong. If we test for a correlation between two measures, and we find that the correlation is not significant, we (should) say, 'Well, whatever the effect is, it's small (and we don't know what direction it's in).' If the correlation is actually large, we've made a type M error.
A type S correlation occurs when we get the sign of an effect wrong. Let's say we find a significant positive correlation, and conclude that the population correlation is positive. If the population correlation is actually negative, we've made a type S error.
Type M and Type S errors make a lot more sense than Type I and II errors (which, as we've seen here, don't make sense). And they're a lot easier to remember. Gelman then goes into a lot of Bayesian elaboration, which I don't want to go into. I can be a Bayesian when I need to, but I've really got to need to.

3 Comments:
The argument that every null hypotesis is false is very similar to the argument that, on a philosophical level, there can be only countable sample spaces. Thus, there can be only discrete probability distributions. However, nobody thinks to abandon countinous distributions in practice.
There are different decision problems in the discussion between hypotesis testing, point estimation, interval estimation and multiple decision procedure (where M and S error types seems to be). For a broader discussion of this topics, I like the first chapter of Eric Lehmann's Testing Statistical Hypoteses.
Well, I'd argue that it's similar in as much as arguing whether anything exists or not is similar. However, it's different in that we could ask whether thinking about discrete probability distributions as continuous probability distributions causes problems. I'd say no.
Does thinking about null hypotheses as being true or not cause problems? I'd say yes. How many times do you hear people say "X and Y have been found to be the sam", when that's not true, what is true is that "X and Y have not been found to be different.
I'm not familiar with the text you refer to.
Great stuff - I like it. Yes, type I and type II errors become confusing when you try modelling these things. Or even thinking about them. If you're measuring reaction times, say, what are the chances that two subsets of a population will have exactly the same mean RT on a task? Or exactly the same height. Null hyp's are odd when they involve equality!
I've found that playing around with big populations in R with different means and variance helped. So you set things up so you know the two means are almost the same; try different variance; and then try sampling from the population with different n's.
BTW, had a chat the other day with someone who believed that it was dangerous to have too big an n -- apparently suddenly basal level stuff would become significant. I had words about effect sizes and actually looking at the means... (He was worried about controls in biological experiments where you want to say that a particular manipulation "has no effect"...)
Post a Comment
<< Home