Saturday, June 30, 2007

You're a Bayesian!

I've written a bit before about Bayesian statistics, here, here and here (that last one where I stole a line from Brad Efron, who said "We can all be Bayesians when we need to be," and also in a recently published book. I'm kind of sympathetic towards Bayesian analysis, but I very rarely do it. The basis for Bayesian analysis is that we incorporate the prior probability of a result into our analysis. Some people are positively antagonistic towards Bayesian thinking - denying that there is ever a use for it - the selection of the prior probability being something of a sticking point. (Actually, Bayesian analysis is lots more complex than that, and doesn't always require what are called 'informative priors', but we won't worry about that for a minute.

However, the most recent issue of Significance had a very interesting article by Stephen Senn, in which he wrote about the TeGenero tgn1412 drug trial catastrophe which occurred in March 2006, when 6 volunteers received the drug, and two received a placebo. The 6 volunteers almost immediately had massive immune system reactions - specifically a cytokine storm, and were hospitalised for at least a month.

What we have here, is the potential of a statistical analysis We've got a 2x2 table, so let's do the stats.
                       Placebo    Drug
Yes 0 6
Cytokine Storm
No 2 0

A 2x2 table. We obviously can't do a chi-square test, as the sample is too small. But we can do a Fisher's exact test. If we do that we get a one-tailed p of 0.036. It's a one-tailed test, so our p-value cut off is 0.025, so we don't have evidence that the drug caused the cytokine storm, and all the subsequent ills.

But that's got to be a silly thing to say. It's obvious that the drug did cause the cytokine storm. It's not just barely significant; it's really, really obvious. Why is it so obvious? It's obvious because people don't have cytokine storms every day. In fact, if you haven't got the Spanish Flu we're pretty safe saying that you will never have a cytokine storm. In other words, it's not just the data that we have obtained here that we need to take into account. We need to take into account the probability of having a cytokine storm ever is very low. In other words, we need to take into account the prior probability. And so we have just done a Bayesian analysis.

Tuesday, June 26, 2007

Tricia sent me an email:
Greetings and a quick query re power calculations. I've read the very clear text on your website about this [that would be here - JM], but have one question that it doesn't answer. We are planning a study of stress, coping etc in staff in paediatric oncology. The subjects will be all staff in all the UK paediatric oncology units - so the total population rather than a sample. I think I vaguely remember that if you do a total population study then you don't do a power calculation, as it's not a sample, and anyway you can't increase
numbers. Is that correct?
Interesting question.

Two points to answer it.

First, if you have the whole population, you don't need to do inferential statistics at all, so power is irrelevant.

If you have a cohort of sociology students at the University of Uttoxeter, their average age might be 20 years and 3 months. There's no standard error or confidence interval on that. Because you *know* their average age. A cohort of psychology students might be 20 years, 3 months, and 1 day. The psychology students are older. There's no significance test to be done, because it's a population difference.

However, that's not what you are normally interested in. Because we aren't interested in just those cohorts of students. We don't want to know if psychology students in 2007 in York are older than sociology students, we want to know about all psychology and sociology students in general.

Similarly, we aren't really interested in all staff in paediatric oncology units. We want to know about all possible staff in those units, and so the current population is just a sample from the population of possible staff. We can make a statement about those staff today, but tomorrow, they might have changed. You can assess it today but if you don't do a significance test, you can't say anything about it tomorrow.

So you need to do a significance test, and if you need to do a significance test, you need to do a power calculation. The purpose of the power calculation is to determine how large an effect you are likely to find. If you don't get a significant result, you can say "Well, the effect would have had to be as large as X, so whatever the
effect is, it's probably smaller than X."

Two further issues, that you might not have considered.

1) How large is the population? You might look into compromise power analysis if it's small, and there is nothing you can do about it.

2) The sample is going to be clustered, and that's going to make life tricky, because you need to take it into account in both the analysis, and the power calculation. That is, you need to distinguish between within unit effects, and between unit effects. (Is it that within a unit, the people who have coping style X are more stressed, than the
people who have coping style Y, or is it that in there are differences between units, in that some units have more coping style X, and less stress, with within a unit, there is no relation between style and stress. It's possible for those effects to be in opposite directions.)