Wednesday, March 28, 2007

Scoring questionnaires in SPSS and Stata.

I sent an email to the psych-postgrads list, about scoring questionnaires in SPSS. I thought I'd reproduce it here, and elaborate a bit to include Stata.

First, use syntax. You can check syntax for errors, you can rerun it when you find another late questionnaire, you can save it and re-use it years later. You can give it to your friends and make yourself a more popular person.

Here are some tips for syntax:

First, SPSS doesn't mind "whitespace", so you can make your syntax
more legible, and less prone to mistakes:

Instead of:

COMPUTE AQPHYSIC = q1aq + q5aq + q7aq + q12aq + q13aq + q20aq + q24aq + q27aq + q29aq.


q1aq +
q5aq +
q7aq +
q12aq +
q13aq +
q20aq +
q24aq +
q27aq +

Secondly, items often need to be reversed, so you need to create a new variable for that, which you might call q29aq.rev. But if you do that, you've got to remember to put it in your syntax to score the questionnaire. Instead of doing that, create a new variable for every item in your scale (or subscale).

So instead, create new variables with the new scales.

For example, I wrote some syntax (in 1994) for scoring the EPQ-R (short form). Here's the bit that creates the lie scale:

COMPUTE = epq3 .
COMPUTE = epq8 .
COMPUTE = epq12 .
COMPUTE = epq16.
COMPUTE = epq20 .
COMPUTE = epq24 .
COMPUTE = epq29 .
COMPUTE = epq33 .
COMPUTE = epq37 .
COMPUTE = epq40 .
COMPUTE = epq45 .
COMPUTE = epq47 .

I create items to (I call them .jm to make it obvious what they were, and because I might be using this on anyone's dataset, and I don't know what their
variables will be called.)

Next you need to reverse the items that need to be reversed:

RECODE (1=0) (0=1) .

When you've done that, you can create the sums very easily, using compute statements.

There's an additional advantage to using new variables, and that is that of writing all the variables out, you can use "to".


compute = sum( to

And SPSS knows that means all the between and

BUT WAIT, there's another problem, which is missing data. Software handles this in two ways: Excel gives it a zero. SPSS either gives it a zero, or makes the sum variable missing. Neither of these are right.

The solution is to use the mean score, and then multiple the mean by the number of items. If all variables are completed, then the mean score multiplied by the number of items will equal the total. If a score is missing, then that item will be given the average of all the items that the person did complete.

However, if a person only completed one item out of 50, we probably don't want to give them a score for the total. The solution is mean.x, where x is the number of items that must have been completed for a score to be given.

So, if you want to only give people a score on the lie scale if they have completed 8 items, you use:

compute = mean.8( to * 12.

However, it's often better not to use the total score anyway, the mean
score for an item is more useful, as it's more interpretable, so just
miss off the *12, and use:

compute = mean.8( to .

You can also do this in Stata, and it's far, far easier. In Stata, you use the -alpha- command. The command works out which items need to be reversed (very, very occasionally it gets it wrong though), and calculates a mean score.

Stata does care about carriage returns (unless you tell it not to) so you can't use the same trick to clarify your code.

In Stata, you would write (all on one line):
alpha epq8 epq12 epq16epq20 epq24 epq29 epq33 epq37 epq40 epq45 epq47 , generate ( min(8) item

The generate() option tells Stata to generate a new variable. min(8) tells Stata to only include cases that have answered at least 8 items, and item tells Stata to do an item analysis, so that you can see which items Stata thought should be reversed.


Blogger Andy said...

This post has been removed by the author.

5:55 PM  
Blogger Andy said...

You also need an "EXECUTE ." to get it to do anything, of course.

5:56 PM  
Blogger J said...

Andy, as ever, speaks the truth. Well, almost the truth.

SPSS won't calculate the scores until you run an execute OR until you try to use the variables. If you've got a really, really big dataset, it's worth leaving the execute off, until you need the variable. When you do some analysis, SPSS will do the scoring and the analysis with one run through of the data, saving you some time.

Incidentally, by really, really big, I mean somewhere over 100,000 cases. Less than that, and it doesn't really matter.


6:56 PM  
Blogger Gareth Hagger-Johnson said...

Why do you have to use "thru" for RECODE, and "to" for COMPUTE, when referring to sets of variables? SPSS is quirky.

2:56 PM  

Post a Comment

<< Home