Standard Deviations (Sample and Population)
There are two different kinds of standard deviations, and they are a bit prone to getting confused. Here's the text of an email that Darren Van Laar sent to me:
As I haven't got the book yet, I'm not exactly sure what's on page 42, but I can guess what it's about. And this is where there are slightly different terminologies for the standard deviation.
If you have a sample, taken from the population (which is what you almost always have), then you use the SD which divides by N - 1. This is the unbiased estimate of the standard deviation. It is the best estimate of the population standard deviation.
Because it's the estimate of the population standard deviation, it's sometimes called the population standard deviation (this is what we call it). But it's the estimate that
you should use when you have a sample, so you could call it the sample standard deviation as well.
Similarly, if you have measured the entire population, then your standard deviation is not divided by N-1, it's divided by N. Sometimes (and Excel is included), this is called the population standard deviation, because it's the standard deviation that's used when you have measured the population.
In fact, it's mostly a lot of worrying about not very much, because we never have an entire population, so we never divide by N, we always divide by N - 1 .
However, it's possible that it's wrong in the book - I haven't seen it to check, and we had such a horrible time with the typesetter that all kinds of things changed - between what we wrote and the proofs that we saw.
There's a bit on p.42 that i'm unsure of though - as the population and sample standard deviation terms seem to be a bit conflated unless i'm going crazy (possible!).
Sample Sd is the unbiased estimator (is divided by n-1) and is denoted by s. Pop Sd is the other one.
Also, this makes Excel correct, but probably just me...
As I haven't got the book yet, I'm not exactly sure what's on page 42, but I can guess what it's about. And this is where there are slightly different terminologies for the standard deviation.
If you have a sample, taken from the population (which is what you almost always have), then you use the SD which divides by N - 1. This is the unbiased estimate of the standard deviation. It is the best estimate of the population standard deviation.
Because it's the estimate of the population standard deviation, it's sometimes called the population standard deviation (this is what we call it). But it's the estimate that
you should use when you have a sample, so you could call it the sample standard deviation as well.
Similarly, if you have measured the entire population, then your standard deviation is not divided by N-1, it's divided by N. Sometimes (and Excel is included), this is called the population standard deviation, because it's the standard deviation that's used when you have measured the population.
In fact, it's mostly a lot of worrying about not very much, because we never have an entire population, so we never divide by N, we always divide by N - 1 .
However, it's possible that it's wrong in the book - I haven't seen it to check, and we had such a horrible time with the typesetter that all kinds of things changed - between what we wrote and the proofs that we saw.


