Saturday, November 25, 2006

The largest state in the USA is ...

The Guardian today has a story about carbon dioxide emissions and air travel. But that's not what's interesting. What's interesting is this quote: "with ministers hopeful that California, the largest state in the US ...". Now residents of Alaska (approx 4 times bigger) and Texas (approx 50% bigger) are going to be surprised by that.

Of course, what the author meant was most populous, rather than largest (around 33 million people, as against Texas, which is 2nd with about 20 million). But it's interesting how these things get muddled up inside people's heads, when they are usually highly correlated.

It also shows (again) that if you are going to state statistics to make a point, you have to get them right.

Thursday, November 16, 2006

Milton Friedman has died

Milton Friedman who won the 1976 Nobel prize for economics, has died. Whilst he is most well known as an economist (The Guardian described him as Margaret Thatcher's monetarist guru) , he was also developed the Friedman test, which is a non-parametric equivalent of a one-way ANOVA. I mention this because it reminds us that the people that developed statistical tests didn't do it in a vacuum - they did it because they were researchers, and had a problem to solve, in economics (as here) or in some other substantive field. As I've said before, statistical tests aren't, therefore, about statistics, they are about psychology, or sociology, or health, or economics. Textbooks that appear to be about statistics are not really about statistics - they are about stuff that matters.

Wednesday, November 15, 2006

Trciky averages

Which measure of average is appropriate depends not only on the data, but also on the question. For example, here's a story from the Manchester Evening News, telling us about how much money the average person is going to spend on things like Christmas presents, food and drink. They report the 'average', which is a bit vague, but let's assume it's the mean. Should they have used the mean?

The problem with money is that it frequently has a skewed distribution - some people spend a lot, most spend some, and a few spend a little. So, what measure should we use? The mean, the median (or something more obscure like the geometric mean)? For example, if 9 people spend 10 pounds on shopping, and 1 spends 910, the mean spend is 100 pounds. The median spend is 10 pounds. Which one is correct?

That depends on who you are, and what you want to know. The story is written well, because it's written from the perspective of the shops, and they report the mean. If I am a shop, I want to know how much money I am going to take in, if 1000 people come through my door. And, given that the mean is 100 pounds, the answer is 100,000 pounds.

If I am a person, I might want to know how much money to spend on my friends, to look about the same as them. The mean here is useless, and we instead want the median. Spending the median (a tenner, in this case) means that half the people will spend more, and half will spend less, and we'll be in the middle.

(Of course, another problem is that of getting things like confidence intervals and p-values. Economists have to grapple with that one a lot, but other social scientists, thankfully, can leave it to them - most of the time).