I sent an email to the psych-postgrads
list, about scoring questionnaires in SPSS
. I thought I'd reproduce it here, and elaborate a bit to include Stata
First, use syntax. You can check syntax for errors, you can rerun it when you find another late questionnaire, you can save it and re-use it years later. You can give it to your friends and make yourself a more popular person.
Here are some tips for syntax:
doesn't mind "whitespace
", so you can make your syntax
more legible, and less prone to mistakes:
COMPUTE AQPHYSIC = q1aq + q5aq + q7aq + q12aq + q13aq + q20aq + q24aq + q27aq + q29aq.
COMPUTE AQPHYSIC =
Secondly, items often need to be reversed, so you need to create a new variable for that, which you might call q29aq
.rev. But if you do that, you've got to remember to put it in your syntax to score the questionnaire. Instead of doing that, create a new variable for every item in your scale (or subscale
So instead, create new variables with the new scales.
For example, I wrote some syntax (in 1994) for scoring the EPQ
-R (short form). Here's the bit that creates the lie scale:COMPUTE l1.jm = epq3 .COMPUTE l2.jm = epq8 .COMPUTE l3.jm = epq12 .COMPUTE l4.jm = epq16.COMPUTE l5.jm = epq20 .COMPUTE l6.jm = epq24 .COMPUTE l7.jm = epq29 .COMPUTE l8.jm = epq33 .COMPUTE l9.jm = epq37 .COMPUTE l10.jm = epq40 .COMPUTE l11.jm = epq45 .COMPUTE l12.jm = epq47 .
I create items l1.jm
. (I call them .jm
to make it obvious what they were, and because I might be using this on anyone's dataset
, and I don't know what their
variables will be called.)
Next you need to reverse the items that need to be reversed:RECODE l1.jm l4.jm l11.jm (1=0) (0=1) .
When you've done that, you can create the sums very easily, using compute statements.
There's an additional advantage to using new variables, and that is that of writing all the variables out, you can use "to".
So:compute l.total = sum(l1.jm to l12.jm).
knows that means all the between l1.jm
BUT WAIT, there's another problem, which is missing data. Software handles this in two ways: Excel gives it a zero. SPSS
either gives it a zero, or makes the sum variable missing. Neither of these are right.
The solution is to use the mean score, and then multiple the mean by the number of items. If all variables are completed, then the mean score multiplied by the number of items will equal the total. If a score is missing, then that item will be given the average of all the items that the person did complete.
However, if a person only completed one item out of 50, we probably don't want to give them a score for the total. The solution is mean.x, where x is the number of items that must have been completed for a score to be given.
So, if you want to only give people a score on the lie scale if they have completed 8 items, you use:compute l.total = mean.8(l1.jm to l12.jm) * 12.
However, it's often better not to use the total score anyway, the mean
score for an item is more useful, as it's more interpretable, so just
miss off the *12, and use:compute l.total = mean.8(l1.jm to l12.jm) .
You can also do this in Stata
, and it's far, far easier. In Stata
, you use the -alpha- command. The command works out which items need to be reversed (very, very occasionally it gets it wrong though), and calculates a mean score.Stata
does care about carriage returns (unless you tell it not to
) so you can't use the same trick to clarify your code.
, you would write (all on one line):alpha epq8 epq12 epq16epq20 epq24 epq29 epq33 epq37 epq40 epq45 epq47 , generate (l.total) min(8) item
option tells Stata
to generate a new variable. min(8)
to only include cases that have answered at least 8 items, and item
to do an item analysis, so that you can see which items Stata
thought should be reversed.