Saturday, November 17, 2007

Effect Sizes

A Davies sent a message to the psych-methods list:

I have carried out three hierarchical regression analyses (linear forattitudes and intention, and logistic for behaviour). Within these regressions I have controlled for past behaviour and baseline psychological variables, such that I want to know what the effect of the intervention isonce these variables are controlled for in the early steps.

I am familiar with using means and standard deviations to calculate cohen’s d, but am unsure how to obtain these values from a regression analysis? Are there other approaches that can be used from data analysed using hierarchical regression? I understand that the R2 value can be used but as I understand it that is an indicator of the effect of adding the variable to the regression. I've also read about using the t-test statistic but when I have used it it seems to indicate a large effect of the intervention (d=.45) when actually the intervention is only marginally significant (p<.07) as a predictor of change. Does anyone have any suggestions?

To answer this question, we should think about what an effect size is, and what it's for. Sometimes the units of measurement are meaningful. If eating a tomato every day is associate with living (on average) two years longer, then there's no need to turn this into an effect size. You understand what two years means. It makes sense. You don't need an effect size, and you can easily compare it to other studies that examined the influence of eating a carrot every day.

In psychology, and other social sciences, we often have measures that are not meaningful. If one study has looked at the efficacy of Prozac for depression, and found that Prozac was associated with a difference of 8 points on the CESD; and a second study has looked at the efficacy of CBT and found a difference of 4 points on the PHQ, it's hard to compare those two effects.

The solution is an effect size, something like Cohen's d. Cohen's d is very simple. It's the difference between the two groups, divided by the standard deviation (that's a pooled standard deviation, so it's a tiny bit fiddlier, but not much).

If the SD of the CESD was 16, then 8 is half of that, so the effect of Prozac was d = 0.5. If the SD of the PHQ was 8, then 4 is half of that, so the effect of Prozac was d = 0.5, and the two effects were the same.

And as long as we have a dichotomous predictor, we are happy.

However, when we move to regression and multiple regression, it gets trickier, and we need something different. There are several things that we can use.

The first is the standardized effect, what SPSS calls beta. If you've only got one predictor, it's the correlation. Correlations are nice. We understand correlations. We might not like the standardized effect for a couple of reasons. One of them is that it destroys the units. If we have units we are interested in, then the standardized effect hides them. If the standardized effect of the relationship between tomatoes eaten per day and longevity is moderate (say 0.3) I might go and eat a lot of tomatoes. However, you might then tell me that eating one more tomato per day increases longevity by 12 seconds. That's pretty poor. If I only knew the standardized effect, that would be hidden from me. If your predictor is dichotomous, then the standardized effect is very silly.

The second choice is a partially standardized effect. Here, you standardize only the outcome variable, and keep the predictor unstandardized. This effect is the difference, in standard deviation units, associated with a 1 unit change in the predictor. If your predictor is dichotomous, that's Cohen's d. If it's not dichotomous, it's analogous to Cohen's d.

The third choice if you're using hierarchical regression is the change in R2. You ask how much additional variance is explained by the predictor (or predictors) that you added to the model at each step.


In the case of logistic regression, things get harder (as a general rule, everything is harder with logistic regression). Every sortware program gives you the estimate of the logistic regression as output, but not every package gives the Odds Ratio (sometimes called Exp(B)) unless you ask for it. (In Stata, for example, you need to use the , or option; in R, SPSS and SAS it's automatic).

You should always present the odds ratio, because it makes more sense than the estimate. But it doesn't make a lot of sense.

We can't standardize our outcome, because it's dichotomous. We could standardize our predictor, if that made sense, and present the partially standardized OR.

However, odds ratios are funny. They're funny because people don't know how to interpret them, so you should give them help. And you give them help by converting the parameter estimates into probabilities, then choose sensible values for the covariates, and calculate the probabilities associated with them.

I'm going to use the auto data in Stata to demonstrate this (I'll give the Stata code at the end). I'm going to regress foreign (a dichotomous variable indicating whether a car was made in the USA or not), on price (in 1000$) and mpg.

Here's the Stata output:

---------------------------
foreign | Coef.
-------------+-------------
price | .2660188
mpg | .2338353
_cons | -7.648111
---------------------------

Of course, we need to get the odds ratios instead of the linear estimates.

----------------------------
foreign | Odds Ratio
-------------+--------------
price | 1.30476
mpg | 1.263436
----------------------------

The two variables are in real units, so there's no need to standardize them. A price increase of $1000 is associated with the odds of a car being foreign increasing by 1.3 times (these are odds ratios, so they are multiplicativel; holding mpg constant) and one more mpg is associated with odds of being foreign 1.26 times higher. But what does that mean, in terms of the probability (because that's what we're interested in) of a car being foreign.

Let's compare two cars. One that costs $4000, and one that costs $5000 (these aren't new data). We can calculate the probability that each of those cars is foreign, and that will give us an effect size.

We just need to plug our numbers into the regression equation. But hold on, we need numbers for mpg. Let's pick a low value, say 14. (I'm not going to go through all the calculations, because it will take too long. If you're not familiar with how this is done, you can either believe me, or look it up.)

Our $4k which does 14 mpg car has a 0.035 (3.5%) probability of being foreign.
Our $5k which does 14 mpg car has a 0.045 (4.5%) probability of being foreign.

So adding $1k to the price of a car increases the probability it's foreign by 1%.

But what if we chose a different number of mpg? Let's use 25.
Our $4k which does 25 mpg car has a 0.32 (32%) probability of being foreign.
Our $5k which does 25 mpg car has a 0.38 (38%) probability of being foreign.

Which means that at this range, adding $1k has increased the probability by 6%. That's rather a large difference.

You need to calculate those probabilities, and you need to calculate them at appropriate values of the other covariates, in order to ensure that the reader can interpret your regression. However, it's a difficult issue, and you'll want to read more. Two good references are:

Regression Models for Categorical and Limited Dependent Variables. by J Scott Long. (There's a 2nd edition of this book out, which I haven't seen, but it's published by Stata Press, and therefore might be Stata specific).

Data Analysis Using Regression and Multilevel/Hierarchical Models , by Andrew Gelman and Jennifer Hill.

(Tip of the Hat: Some of my thinking on this has been influenced by Thom Baguley, and thanks to Greg Meyer for a correction).



Stata Code
sysuse auto
replace price = price / 1000
logit foreign price mpg
logit foreign price mpg, or