Wednesday, July 05, 2006

Statz 4 LIfe

I'm not sure about the second formula (11 seconds into the video, written on someone's right forearm). Is that V or the Greek letter Nu? And what's df* mean? And is that eta or n on the bottom?

Very funny though.

What is probability?

Here's a quote from The Guardian:
A friend of mine got pregnant the first time she slept with her (now) husband, at the age of 43; this was after she had made a documentary on infertility and had been repeatedly told by fertility experts that her chances of conceiving naturally were "less than 5%". This may be right as a general statistic, but it wasn't right in her case: her chances of conceiving, provided she had sex at the right time of the month, were 100%.
This raises an interesting problem about what we mean by probability. Probability is pretty fundamental to statistics, we are always trying to calculate p-values and confidence intervals associated with various measures, so it might surprise (or depress, or perhaps not surprise) you that there isn't actually agreement on what a probability means. We'll come back to that, first we'll look at the error in saying this.

The author is using a post-hoc probability and confusing it with a prior probability. You might say that the probability of me getting heads when I toss this coin is 0.5 (this would be the priour probability). I could argue that it's not, it's either 1.0 or 0.0, because it will either happen or it won't happen. I guess that what the author was calling a "general statistic" is a prior probability, and what the author called "her chances" were the posterior probability - the probability after the event.
But you can't use a posterior probability as a priour probability. If I say I have a 1 in 6 chance of rolling a 6 on a fair die, and every time it happens, you say "Haha, see, you were wrong, the chance was 100%", and then you ignore the 5 times in 6 that it does happen, well, I'm just not going to be your friend any more.

Anyway, let's get back to what a probability means. Probably the most common interpretation of statistics is the frequentist interpretation. A frequentist interpretation is a little like the dice rolling experiment - we know that in the long run, 1 time in 6 the dice will come up 6. The big problem for the frequentist interpretation is that we need to talk about the long run probability of this particular woman becoming pregnant. We need to say that if we rewound this person's life 1000s of times, they would become pregnant on 5% of occasions.

However in this sort of scenario this is not really sensible. Because this person will become pregnant every time, just like this time.

The alternative is to use a Bayesian interpretation of probability, which is also sometimes called subjective probability or personal probability. A Bayesian probability can be thought of as the degree of confidence that a person would have that an event will occur. If my subjective probability of an event occuring is 5%, I would be willing to place a bet at 19:1 (or better).

The problem with the subjective probability is that, well, it's subjective. Probability stops being a mathematical thing, and starts being more of an opinion, and so it becomes harder to say that someone is wrong.