Wednesday, November 11, 2009

test ignore

test

Ten Statisticians and their Impacts for Psychologists

Dan Wright has written an article about Ten statisticians and their impacts for psychologists, which is well worth reading. As psychologists, we learn about, and use statistics, but we don't learn enough (in my opinion) about the historical and philosophical underpinnings - which are linked to the individuals - how many psychologists know that Fisher and Pearson did not get on, for example.

As well as describing these statisticians early work, he gives some snippets of details about their lives - I didn't know, for example, that Jerzy Neyman had ever been imprisoned by the Bolsheviks after the Russian revolution (it's Neyman that we can thank for people using 0.05 as a stringent cutoff).

Two snippets he didn't mention - perhaps for lack of space. He mentions that there are only four females in a list of 100 prominent statisticians - but one of these (F N David) is named after one of the others (Florence Nightingale). In addition, he mentions Ronald Fisher (obviously), George Box, and Fisher's daughter, Joan Fisher Box (who wrote Fisher's biography), but does not mention that Joan Fisher Box acquired her name (you know where this is going, don't you) by marrying George Box.

I've also heard it said that George Box and David Cox knew each other for a long time, and thought it would be cool to publish a paper together - but they worked in very different areas. Eventually they did publish one on a method for transformations, now known as Box-Cox transformations, and sometimes just called Box-Coxing, as in "Have you tried Box-Coxing that dataset?" If anyone had any confirmation that it was true, I'd be interested.

Friday, November 6, 2009

No, REALLY don't do statistics with Excel

There was a query on the SAS mailing list today - someone got inconsistent results for confidence intervals between Excel and SAS. In Excel, they were using the confidence() function, which I'd not come across before. And I'm glad about that.

See, to calculate a confidence interval, you multiply the standard error of the distribution for the critical value from the t-distribution.

You can find that value using (say) R, with the qt() function or Excel, with the tinv() function. The t-distribution approximates the normal distribution as the sample size increases - you need a sample size of infinity for them to be exactly the same, but if the same size is large enough, then it's close. With the normal distribution, if you want a 95% confidence interval, the critical value is 1.96, which is so close to 2 that you can pretty much us 2 and get away with it. (Around 95% of cases lie within 2 SDs of the mean in a normal distribution).

If you have a sample of 500, then the critical value for t is 1.96 - the same as for the normal distribution. But even if your sample is as low as 50, the critical value is 2.0, which is close enough for almost anything (if you're using a computer it will work it out, and use it, to 16 decimal places, and we don't need to worry anyway).

However, this person had a sample size of 6. With a sample size of 6, the critical value is 2.6, that's a pretty long way off. Well, it turns out that Excel doesn't use the t distribution, it uses the normal distribution. If you have a large sample, this is going to cause you no problems, but if you've got a small sample, you are going to be off - off by almost 30%, which is quite a long way for a confidence interval to be off.

I googled (not Binged, of course) around this a bit, because I don't believe I'm the only person to have noticed, and I found this web page. Which is called "Function Consistency Improvements in Excel 2010", and renames the confidence() function to confidence.t(), and says: "Consistent definition with industry best practices. Confidence function assuming a Student’s t distribution."

Oh, that's great. It wasn't wrong or a cock up or a mistake. It was just inconsistent with best industry practices. Well, I'll try that at home next time I tread dog turd into the living room / spill tomato soup on the sofa / allow the children to play with the rat poison. That wasn't wrong, or a mistake, I will claim, as I am reprimanded. In the future I will eat tomato soup at the table, in order to ensure that my eating behavior is consistent with best dining practice .