Chapter Three (Salkind)
Vive la Difference: Understanding Variability
An Overview of This Chapter
Descriptive Statistics
When we collect data, we have to organize that information in a way that makes it … informative!
As we saw in first chapter of Salkind (“Statistics or Sadisticsâ€), descriptive statistics are helping us to “describe†data.
In chapter two, we discussed one specific type of descriptive statistics – measures of central tendency (the mean, median, and mode). You recall those, right?
2
Descriptive Statistics
Descriptive Statistics
Measures of central tendency (mean, median, and mode) are really helpful in describing a single number involving a central score in the data set, but they are not the full picture. Here we move onto another measure – “measures of spreadâ€
Fancy phrase, huh? Don’t fret yet, as it’s actually pretty easy to understand
Measures of spread, like measures of central tendency, are descriptive statistics that try to find a single number that best describes the variability in a data set
3
In this chapter we cover the following items …
Part One: Why Understanding Variability Is Important
Part Two: Computing the Range
Part Three: Computing the Variance
Part Four: Computing the Standard Deviation
Part Five: Using the Computer To Compute Variability
Part Six: An Eye Toward The Future
4
Part One
Why Understanding Variability Is Important
Why Variability Is Important
Why Understanding Variability Is Important
In Chapter 2 (Salkind), we discussed three measures of central tendency, or finding a single number that describes the central nature of a distribution (mean, median, and mode)
Here, we will discuss a single number that describes just how variable the data set is using (again) three factors: the range, the variance, and the standard deviation
Before detailing these, consider variability in general …
6
What is the Mean?
Variability refers to how scores differ from on another. Consider the following scores:
Set A: 5, 8, 20, 23, 44
Set B: 17, 19, 20, 21, 23
What is the mean for Set A? 20
What is the mean for Set B? 20
7
What is the Median?
Why Understanding Variability Is Important
Variability refers to how scores differ from on another. Consider the following scores:
Set A: 5, 8, 20, 23, 44
Set B: 17, 19, 20, 21, 23
What is the median for Set A? 20
What is the median for Set B? 20
8
What is the Mode?
Why Understanding Variability Is Important
Variability refers to how scores differ from on another. Consider the following scores:
Set A: 5, 8, 20, 23, 44
Set B: 17, 19, 20, 21, 23
What is the mode for Set A? Actually, there are 5 modes
What is the mode for Set B? Same here – 5 modes
9
Which set has more variability?
Variability refers to how scores differ from on another. Consider the following scores:
Set A: 5, 8, 20, 23, 44
Set B: 17, 19, 20, 21, 23
But which set is more variable?
Set A ranges from 5 to 44. That’s quite a lot of variability
Set B ranges from 17 to 23. Still a little variability, but not nearly as much as set A.
10
Variability is Important
If we simply went with our measure of central tendency, we could conclude that the two sets are identical, but the fact that one set (Set A) has a much larger spread gives us another really important descriptive statistic to consider
11
Understanding Variability
Variability thus becomes a measure of how much each score in a group of scores differs from the mean
You might see variability expressed in different terms (such as fluctuation, liability, or error), but they all mean essentially the same thing: variability refers to the spread of scores
So why is variability important?
12
Understanding Variability: An Example
We’ve already talked about how experiments compare two or more groups. Consider a simple control group vs. experimental group design.
If we give the experimental group some treatment (let’s say we give them a drug to reduce social anxiety), we can look at how our experimental and control participants respond in a subsequent social situation
Our hope, of course, is that those in the experimental group will have less anxiety than those in the control group
13
Understanding Variability: An Example (2)
We expect variability between the experimental group and the control group. The set of social anxiety scores in the experimental group should be higher than the social anxiety scores for the control group. This is GOOD variability
But here is where variability gets trickier: Within the same group, there is error variability, something we discussed in Smith and Davis (Extraneous variables: Chapter 7)
14
Error Variability
Error is variability unrelated to your independent variable.
This variability can come from participant demographic characteristics, attitudes, different childhood experiences
It can come from uncontrolled factors in the experiment itself (computer freezes during a study, an alarm rings, a researcher laughs at something she shouldn’t)
15
Between-Group Variability (good)
Although researchers attempt to control variability as much as possible, error variability often gets into the design
If variability between conditions is high AND it is related to the independent variable, then this is good! We want our experimental group to vary from the control group
16
Within-Group Variability (bad)
Although researchers attempt to control variability as much as possible, error variability often gets into the design
But if variability within the same condition is high, this is bad. It means that group is highly variable regardless of the experimental manipulations.
It is also bad if there is variability between groups that is not related to the IV. This is our extraneous variable
17
Pop-Quiz 1: Quiz Yourself
In research studies, we want ____________ group variability to be high and ____________ group variability to be low
A). Within; Between
B). Between: Within
C). Between; Between
D). Within; Within
E). None of the above
Answer 1: B
In research studies, we want ____________ group variability to be high and ____________ group variability to be low
A). Within; Between
B). Between: Within
C). Between; Between
D). Within; Within
E). None of the above
Types of Variability: Range
So what do we do with variability (both the good kind and the bad kind)? Our first step is to understand the different types of variability, which we do by looking at descriptive statistics.
Consider the range:
20
Part Two
Computing the Range
Computing The Range
Computing The Range
To find the range, simply subtract the lowest from the highest score in your distribution of scores
This is the simplest and least informative measure of spread
All of the scores between the two extremes (the highest and lowest scores) are virtually ignored, and thus this measure is very sensitive to extreme scores
Okay, you recall our officer data from Chapter 2 (Salkind), right? Let’s return to that example …
1). The Range
Spousal assault cases over twelve months for ten police officers who responded to the calls
What is the Range for the # Arrests
# Arrests | # Convictions | |
S1 | 5 | 1 |
S2 | 9 | 6 |
S3 | 48 | 12 |
S4 | 62 | 12 |
S5 | 26 | 24 |
S6 | 26 | 1 |
S7 | 84 | 65 |
S8 | 5 | 4 |
S9 | 26 | 8 |
S10 | 8 | 2 |
The Range: Example Data
23
What is the Range for the # Arrests?
84 – 5 = 79
# Arrests | # Convictions | |
S1 | 5 | 1 |
S2 | 9 | 6 |
S3 | 48 | 12 |
S4 | 62 | 12 |
S5 | 26 | 24 |
S6 | 26 | 1 |
S7 | 84 | 65 |
S8 | 5 | 4 |
S9 | 26 | 8 |
S10 | 8 | 2 |
The Range for # Arrests?
What is the Range for the # Convictions?
65 – 1 = 64
# Arrests | # Convictions | |
S1 | 5 | 1 |
S2 | 9 | 6 |
S3 | 48 | 12 |
S4 | 62 | 12 |
S5 | 26 | 24 |
S6 | 26 | 1 |
S7 | 84 | 65 |
S8 | 5 | 4 |
S9 | 26 | 8 |
S10 | 8 | 2 |
The Range for # Convictions?
Pop-Quiz 2: Quiz Yourself
What is the range in this table?
A). 93
B). 83
C). 72
D). 55
E). 33
x |
23 |
67 |
98 |
15 |
48 |
26 |
19 |
22 |
58 |
Answer 2: B
What is the range in this table?
A). 93
B). 83 (98-15 = 83)
C). 72
D). 55
E). 33
x |
23 |
67 |
98 |
15 |
48 |
26 |
19 |
22 |
58 |
Problems with the Range
The range doesn’t take into consideration the numbers falling between the two most extreme scores.
Consider the following graph. All three curves (black, blue, and red) have similar ranges (all reach out to around + or – 5), but their distributions look very different
Problems with using just the range
-5 Range
+5 Range
Most scores close to the mean
Scores more spread out
Scores most spread out
29
The Range can be Misleading
The range thus gives us a general estimate of the differences in a data set, but it can be misleading.
Consider a slightly different officer data set …
1). The Range
Spousal assault cases over twelve months for ten police officers who responded to the calls
What is the Range for the # Arrests (go to the next slide for the answer!)
# Arrests | # Convictions | |
S1 | 5 | 1 |
S2 | 5 | 1 |
S3 | 5 | 1 |
S4 | 5 | 1 |
S5 | 5 | 1 |
S6 | 5 | 1 |
S7 | 84 | 65 |
S8 | 5 | 1 |
S9 | 5 | 1 |
S10 | 5 | 1 |
The Range:
A Second Example Dataset
31
What is the Range for the # Arrests
84 – 5 = 79
# Arrests | # Convictions | |
S1 | 5 | 1 |
S2 | 5 | 1 |
S3 | 5 | 1 |
S4 | 5 | 1 |
S5 | 5 | 1 |
S6 | 5 | 1 |
S7 | 84 | 65 |
S8 | 5 | 1 |
S9 | 5 | 1 |
S10 | 5 | 1 |
The Range for # Arrests
(example 2)
32
What is the Range for the # Convictions
65 – 1 = 64
# Arrests | # Convictions | |
S1 | 5 | 1 |
S2 | 5 | 1 |
S3 | 5 | 1 |
S4 | 5 | 1 |
S5 | 5 | 1 |
S6 | 5 | 1 |
S7 | 84 | 65 |
S8 | 5 | 1 |
S9 | 5 | 1 |
S10 | 5 | 1 |
The Range for
# convictions (Example 2)
33
Is the Range Adequate?
How adequately does 79 arrests describe the data set when all but one officer has only 5 arrests? What about the convictions? Does 64 do an adequate job of describing the spread given that only officer has more than one conviction? Misleading, right!
Let’s turn to a statistical test that better addresses the variability among data points: The Variance
Note: We’ll get to the standard deviation in Part Four
Part Three
Computing The Variance
Computing The Variance
Computing The Variance
Variance refers to a single number that represents the total amount of variation in a distribution
The nice thing about the variance is that it is the squared deviation (or distance) of scores from the mean
(Trust me, this will make more sense when we talk about the standard deviation in the next section)
For now, I want to focus on the variance calculation itself
Variance Formula
Ok, time for our second knuckle-whitening, sweat-inducing, heart-rate raising hard statistical formula
Relax! All we need to do is plug numbers into this formula …
Variance Formula Components
Computing The Variance
Where:
S2 = The variance (our goal in figuring out this formula!)
∑ = The Greek “sum of†sign.
X = Each individual score
= The mean of all of the scores.
n = The sample size
# Arrests |
95 |
84 |
62 |
48 |
26 |
26 |
12 |
9 |
8 |
5 |
5 |
4 |
What is the Variance for the # Arrests?
Don’t worry, I will walk you through this one!
First, let’s copy our twelve scores into a new table (each of these scores is “Xâ€)
What is the Variance for # Arrests?
X | (X-M) | (X-M)2 | ||
95 | ||||
84 | ||||
62 | ||||
48 | ||||
26 | ||||
26 | ||||
12 | ||||
9 | ||||
8 | ||||
5 | ||||
5 | ||||
4 | ||||
∑ | Mean? | |||
Tip: you may
want to make a similar table to this each time you calculate the variance
Calculating the Variance: Step 1
X | (X-M) | (X-M)2 | ||
95 | ||||
84 | ||||
62 | ||||
48 | ||||
26 | ||||
26 | ||||
12 | ||||
9 | ||||
8 | ||||
5 | ||||
5 | ||||
4 | ||||
∑ | 384 / 12 = 32 | |||
Our mean (M, or ) is = 32 Next, subtract the M from EACH X (that is, X – M) |
Calculating the Variance: Step 2
X | (X-M) | (X-M)2 | ||
95 | 63 | |||
84 | ||||
62 | ||||
48 | ||||
26 | ||||
26 | ||||
12 | ||||
9 | ||||
8 | ||||
5 | ||||
5 | ||||
4 | ||||
∑ | 384 / 12 = 32 | |||
M = 32 |
95 – 32 =
Calculating the Variance: Step 3
X | (X-M) | (X-M)2 | ||
95 | 63 | |||
84 | 52 | |||
62 | ||||
48 | ||||
26 | ||||
26 | ||||
12 | ||||
9 | ||||
8 | ||||
5 | ||||
5 | ||||
4 | ||||
∑ | 384 / 12 = 32 | |||
Calculating the Variance: Step 4
X | (X-M) | (X-M)2 | ||
95 | 63 | |||
84 | 52 | |||
62 | 30 | |||
48 | ||||
26 | ||||
26 | ||||
12 | ||||
9 | ||||
8 | ||||
5 | ||||
5 | ||||
4 | ||||
∑ | 384 / 12 = 32 | |||
M = 32 |
Calculating the Variance: Step 5
X | (X-M) | (X-M)2 | ||
95 | 63 | |||
84 | 52 | |||
62 | 30 | |||
48 | 16 | |||
26 | -6 | |||
26 | -6 | |||
12 | -20 | |||
9 | -23 | |||
8 | -24 | |||
5 | -27 | |||
5 | -27 | |||
4 | -28 | |||
∑ | 384 / 12 = 32 | |||
Want to see something cool? What happens when you add all of those (x-m) numbers? |
Calculating the Variance: Step 6
X | (X-M) | (X-M)2 | ||
95 | 63 | |||
84 | 52 | |||
62 | 30 | |||
48 | 16 | |||
26 | -6 | |||
26 | -6 | |||
12 | -20 | |||
9 | -23 | |||
8 | -24 | |||
5 | -27 | |||
5 | -27 | |||
4 | -28 | |||
∑ | 384 / 12 = 32 | ZERO! | ||
Yup, they equal zero. It’s a good way to make sure you did your (X – M) correctly! |
Calculating the Variance: Step 7
X | (X-M) | (X-M)2 | ||
95 | 632 | 3969 | ||
84 | 52 | |||
62 | 30 | |||
48 | 16 | |||
26 | -6 | |||
26 | -6 | |||
12 | -20 | |||
9 | -23 | |||
8 | -24 | |||
5 | -27 | |||
5 | -27 | |||
4 | -28 | |||
∑ | 384 / 12 = 32 | |||
Our next step is to square each (X – M). That is, 63 X 63 = 3969 … |
=
Calculating the Variance: Step 8
X | (X-M) | (X-M)2 | ||
95 | 632 | 3969 | ||
84 | 522 | 2704 | ||
62 | 30 | |||
48 | 16 | |||
26 | -6 | |||
26 | -6 | |||
12 | -20 | |||
9 | -23 | |||
8 | -24 | |||
5 | -27 | |||
5 | -27 | |||
4 | -28 | |||
∑ | 384 / 12 = 32 | |||
52 X 52 = 2704, etc. |
Calculating the Variance: Step 9
X | (X-M) | (X-M)2 | ||
95 | 632 | 3969 | ||
84 | 522 | 2704 | ||
62 | 302 | 900 | ||
48 | 162 | 256 | ||
26 | -62 | 36 | ||
26 | -62 | 36 | ||
12 | -202 | 400 | ||
9 | -232 | 529 | ||
8 | -242 | 576 | ||
5 | -272 | 729 | ||
5 | -272 | 729 | ||
4 | -282 | 784 | ||
∑ | 384 / 12 = 32 | Add them! | ||
M = 32 |
Calculating the Variance: Step 10
X | (X-M) | (X-M)2 | ||
95 | 63 | 3969 | ||
84 | 52 | 2704 | ||
62 | 30 | 900 | ||
48 | 16 | 256 | ||
26 | -6 | 36 | ||
26 | -6 | 36 | ||
12 | -20 | 400 | ||
9 | -23 | 529 | ||
8 | -24 | 576 | ||
5 | -27 | 729 | ||
5 | -27 | 729 | ||
4 | -28 | 784 | ||
∑ | 384 / 12 = 32 | 11648 | ||
Okay, so our M = 32, our n – 1 = 11 (based on 12 officers in the dataset), and (X – M)2 = 11648 |
Calculating the Variance: Step 11
X | (X-M) | (X-M)2 | ||
95 | 63 | 3969 | ||
84 | 52 | 2704 | ||
62 | 30 | 900 | ||
48 | 16 | 256 | ||
26 | -6 | 36 | ||
26 | -6 | 36 | ||
12 | -20 | 400 | ||
9 | -23 | 529 | ||
8 | -24 | 576 | ||
5 | -27 | 729 | ||
5 | -27 | 729 | ||
4 | -28 | 784 | ||
∑ | 384 / 12 = 32 | 11648 | ||
So here is our formula again for variance: s2 = ∑(X-M)2 / n – 1=11648 / 11 = 1058.91 |
Interpreting the Variance
Okay, so our variance for the set of officer data is 1058.91. But what the heck does this mean?
Consider our range again for this data set. It was 95 – 4 = 91
Given this range of arrests (remember that the most prolific arrester has 95 arrests while the lowest has only 4 arrests!), what does the 1058.91 variance really tell us?
We can’t really add 1058.91 to our range of 91, or subtract it from our range. It doesn’t make much sense, right? So …
What does the Variance Mean?
… what does our variance of 1058.91 really mean?
Unfortunately, not very much right now, as the variance is not expressed in the same units (the same arrest numbers) as the original data set.
That’s why we must take another step to understand the measure of spread : We compute the standard deviation. We’ll get to that in a second, but for now a Pop Quiz …
Pop-Quiz 3: Quiz Yourself
Which of the following is one way to represent the variance?
A). s
B). s2
C). s(2)
D). s / n
Answer 3: B
Which of the following is one way to represent the variance?
A). s
B). s2
C). s(2)
D). s / n
What does S2 mean?
Think about that last pop quiz question. We have s2. Well, what happens if we take the square root of s2?
Excellent question (if I do say so myself)! Let’s find out …
Part Four
Computing The Standard Deviation
The Standard Deviation
The standard deviation is exactly what it sounds like: it’s the deviation from something that is standard!
You’ll often see the standard deviation expressed as s or SD
The larger the standard deviation, the more distance the data points in the distribution are from the mean
On the next slide, I’ll show you the formula for both the SD and the variance. Can you tell me what the difference is?
Standard Deviation vs Variance
How do these formulas differ?
Variance is squared while the standard deviation is not!
If you calculate the variance first, then just take the square root and you’ll get the standard deviation!
Standard Deviation
Variance
Computing The Standard Deviation
Variance is squared while the standard deviation is not!
Okay, back to our officer data. Recall the 1058.91 variance
Pop-Quiz 4: Quiz Yourself
What is the square root of 1058.91?
A). 28.23
B). 31.15
C). 32.54
D). 47.99
E). 970.66
Answer 4: C
What is the square root of 1058.91?
A). 28.23
B). 31.15
C). 32.54
D). 47.99
E). 970.66
SD is the Square Root of the Variance
The standard deviation is the square root of the variance
Unlike variance, the standard deviation is expressed in the same units as original numbers (and is thus more useful)
Think about 32.54 in relation to our officer data …
# Arrests |
95 |
84 |
62 |
48 |
26 |
26 |
12 |
9 |
8 |
5 |
5 |
4 |
Mean = 32
SD = 32.54
We have a lot variability here, with only four officers above the mean. The larger the SD, the more variability (here, given a mean of 32, an SD of 32.54 shows a lot of variability!)
So let’s consider a less variable data set ..
Interpreting the SD: Example
Mean = 31.83
SD = 3.13
Range = 36 – 26 = 10
This data set has a very similar mean as our prior data, but a very different SD (3.13, compared to 32.54)
You can also infer just by comparing the numbers here that this data set is much closer together (less spread)!
Interpreting the SD: Example 2
# Arrests |
26 |
27 |
30 |
30 |
31 |
32 |
33 |
34 |
34 |
34 |
35 |
36 |
I actually encourage you to take this new data set and calculate the SD yourself. See if you can replicate my 3.13
Try it on your own!
# Arrests |
26 |
27 |
30 |
30 |
31 |
32 |
33 |
34 |
34 |
34 |
35 |
36 |
Steps In Computing The Standard Deviation
1. List the scores (this can be in any order)
2. Compute the mean for each group
3. Subtract the mean from each score
4. Square each individual difference (subtracted) score
5. Sum (add) the squared deviations
6. Divide by n – 1 This is the variance
7. Take the square root This is the standard deviation
NOTE: The variance is always bigger than the SD
Why Is the Standard Deviation Important?
Now that you know HOW to compute the variance and the standard deviation, it is important to understand WHY we compute them at all.
It all comes down to the normal curve. We’ve discussed this a bit already, but strap in for a more in-depth analysis!
The Normal Distribution
The standard deviation is very important when it comes to the normal curve (or normal distribution, or “bell curveâ€)
In a normal curve, the mean, median, and mode are all in the center, and the left side of the curve is the mirrored equivalent of the right side
More on this later in the semester!
We will discuss normal curves a lot more in later chapters, but for now it is important to know that a lot of the statistics we discuss assume that the scores we are looking at are normally distributed.
Because a lot of statistics rely on the mean, a skewed mean (one that has outliers) violates the statistical assumptions.
How does this tie in to the SD? Well, the SD uses the mean in it’s formula, so if the mean is “skewedâ€, so is the SD
Another Quick Example
You and your friends have just measured the heights of your dogs (in millimeters from their paws to shoulders). This is what you got:
The heights (at the shoulders) are: 600mm, 470mm, 170mm, 430mm and 300mm
What is the Mean?
mm
What’s the mean for dog height?
The mean is 394 mm. That’s the easy part!
600 + 470 + 170 + 430 + 300 = 1970 ÷ 5 = 394
mm
Plotting the Differences
Now plot the difference between each dog and the mean score
600 – 394 = 206
470 – 394 = 76
Etc.
mm
Subject # | X | (X-M) | (X-M)2 | ||
S1 | 600 | 206 | 42436 | ||
S2 | 470 | 76 | 5776 | ||
S3 | 170 | -224 | 50176 | ||
S4 | 430 | 36 | 1296 | ||
S5 | 300 | -94 | 8836 | ||
∑ | 1970 / 5 = 394 | 108,520 | |||
M = 394, s2 = 108,520 / n – 1 |
Calculating the Variance Dog’s Height: Step 1
Subject # | X | (X-M) | (X-M)2 | ||
S1 | 600 | 206 | 42436 | ||
S2 | 470 | 76 | 5776 | ||
S3 | 170 | -224 | 50176 | ||
S4 | 430 | 36 | 1296 | ||
S5 | 300 | -94 | 8836 | ||
∑ | 1970 / 5 = 394 | 108,520 | |||
M = 394, s2 = 108,520 / 4 = 27,130 (our variance) |
Calculating the Variance Dog’s Height: Step 2
Standard Deviation for the Dog Example
The variance is what?
27,130
The standard deviation is the square root of variance, so:
The standard deviation is what?
(or just 165 rounded)
Interpreting the SD
The good thing about the standard deviation is that it is useful. We can show which heights are within 1 standard deviation (165 mm) of the mean:
Using the standard deviation we have a “standard” way of knowing what is normal for a dog, and what is extra large or extra small
– 1 SD {
+1 SD {
Mean = 394
77
Why n – 1? What’s Wrong With Just n?
In our standard deviation and variance formulas, we can use either n or n – 1. Why would we ever subtract 1?
As social scientists, psychologists feel it is better to error on the side of caution, and thus we tend to be conservative
If we must error, error in favor of having too much variability. I used n – 1 in my dog example for this reason. But …
Samples vs the Population
It also comes down to the sample versus the population.
If you have a whole population at your disposal (unlikely, but possible), you can use n (use the whole population!)
But if you draw a sample as I did with my dogs, it is best to go with n – 1, as the sample is merely an estimate of the population (and we want to be conservative in our estimate). This gives us an unbiased estimate
Using n vs n-1 in SD calculations
The larger your sample, the less difference there is between the biased estimate (n) and the unbiased estimate (n – 1).
Sample Size | Value of Numerator in SD | Biased Estimate of the Population (n) |
Unbiased Estimate of the Population SD (n – 1) | Difference Between Biased and Unbiased |
10 | 500 | 7.07 | 7.45 | 0.38 |
100 | 500 | 2.24 | 2.25 | 0.01 |
1,000 | 500 | 0.7071 | 0.7075 | 0.0004 |
80
Things to Remember about the SD
1. When using the standard deviation, don’t think about the mode and median. The SD only works with the mean
2. The larger the standard deviation, the more spread
3. Like the mean, the standard deviation is very sensitive to outliers (extreme scores)
4. If the standard deviation is zero, this indicates that there is NO variation at all (this is very rare)
Pop-Quiz 5: Quiz Yourself
Do you want the standard deviation to be a big number or a small number?
A). Big
B). Small
C). It doesn’t really matter
D). I haven’t got a clue!
Answer 5: B
Do you want the standard deviation to be a big number or a small number?
A). Big
B). Small – The smaller the number, the less “bad†variation
C). It doesn’t really matter
D). I haven’t got a clue!