+1(978)310-4246 credencewriters@gmail.com
Select Page

Chapter Three (Salkind)

Vive la Difference: Understanding Variability

An Overview of This Chapter

Descriptive Statistics

When we collect data, we have to organize that information in a way that makes it â€¦ informative!

As we saw in first chapter of Salkind (â€œStatistics or Sadisticsâ€), descriptive statistics are helping us to â€œdescribeâ€ data.

In chapter two, we discussed one specific type of descriptive statistics â€“ measures of central tendency (the mean, median, and mode). You recall those, right?

2

Descriptive Statistics

Descriptive Statistics

Measures of central tendency (mean, median, and mode) are really helpful in describing a single number involving a central score in the data set, but they are not the full picture. Here we move onto another measure â€“ â€œmeasures of spreadâ€

Fancy phrase, huh? Donâ€™t fret yet, as itâ€™s actually pretty easy to understand

Measures of spread, like measures of central tendency, are descriptive statistics that try to find a single number that best describes the variability in a data set

3

In this chapter we cover the following items â€¦

Part One: Why Understanding Variability Is Important

Part Two: Computing the Range

Part Three: Computing the Variance

Part Four: Computing the Standard Deviation

Part Five: Using the Computer To Compute Variability

Part Six: An Eye Toward The Future

4

Part One

Why Understanding Variability Is Important

Why Variability Is Important

Why Understanding Variability Is Important

In Chapter 2 (Salkind), we discussed three measures of central tendency, or finding a single number that describes the central nature of a distribution (mean, median, and mode)

Here, we will discuss a single number that describes just how variable the data set is using (again) three factors: the range, the variance, and the standard deviation

Before detailing these, consider variability in general â€¦

6

What is the Mean?

Variability refers to how scores differ from on another. Consider the following scores:

Set A: 5, 8, 20, 23, 44

Set B: 17, 19, 20, 21, 23

What is the mean for Set A? 20

What is the mean for Set B? 20

7

What is the Median?

Why Understanding Variability Is Important

Variability refers to how scores differ from on another. Consider the following scores:

Set A: 5, 8, 20, 23, 44

Set B: 17, 19, 20, 21, 23

What is the median for Set A? 20

What is the median for Set B? 20

8

What is the Mode?

Why Understanding Variability Is Important

Variability refers to how scores differ from on another. Consider the following scores:

Set A: 5, 8, 20, 23, 44

Set B: 17, 19, 20, 21, 23

What is the mode for Set A? Actually, there are 5 modes

What is the mode for Set B? Same here â€“ 5 modes

9

Which set has more variability?

Variability refers to how scores differ from on another. Consider the following scores:

Set A: 5, 8, 20, 23, 44

Set B: 17, 19, 20, 21, 23

But which set is more variable?

Set A ranges from 5 to 44. Thatâ€™s quite a lot of variability

Set B ranges from 17 to 23. Still a little variability, but not nearly as much as set A.

10

Variability is Important

If we simply went with our measure of central tendency, we could conclude that the two sets are identical, but the fact that one set (Set A) has a much larger spread gives us another really important descriptive statistic to consider

11

Understanding Variability

Variability thus becomes a measure of how much each score in a group of scores differs from the mean

You might see variability expressed in different terms (such as fluctuation, liability, or error), but they all mean essentially the same thing: variability refers to the spread of scores

So why is variability important?

12

Understanding Variability: An Example

Weâ€™ve already talked about how experiments compare two or more groups. Consider a simple control group vs. experimental group design.

If we give the experimental group some treatment (letâ€™s say we give them a drug to reduce social anxiety), we can look at how our experimental and control participants respond in a subsequent social situation

Our hope, of course, is that those in the experimental group will have less anxiety than those in the control group

13

Understanding Variability: An Example (2)

We expect variability between the experimental group and the control group. The set of social anxiety scores in the experimental group should be higher than the social anxiety scores for the control group. This is GOOD variability

But here is where variability gets trickier: Within the same group, there is error variability, something we discussed in Smith and Davis (Extraneous variables: Chapter 7)

14

Error Variability

Error is variability unrelated to your independent variable.

This variability can come from participant demographic characteristics, attitudes, different childhood experiences

It can come from uncontrolled factors in the experiment itself (computer freezes during a study, an alarm rings, a researcher laughs at something she shouldnâ€™t)

15

Between-Group Variability (good)

Although researchers attempt to control variability as much as possible, error variability often gets into the design

If variability between conditions is high AND it is related to the independent variable, then this is good! We want our experimental group to vary from the control group

16

Although researchers attempt to control variability as much as possible, error variability often gets into the design

But if variability within the same condition is high, this is bad. It means that group is highly variable regardless of the experimental manipulations.

It is also bad if there is variability between groups that is not related to the IV. This is our extraneous variable

17

Pop-Quiz 1: Quiz Yourself

In research studies, we want ____________ group variability to be high and ____________ group variability to be low

A). Within; Between

B). Between: Within

C). Between; Between

D). Within; Within

E). None of the above

In research studies, we want ____________ group variability to be high and ____________ group variability to be low

A). Within; Between

B). Between: Within

C). Between; Between

D). Within; Within

E). None of the above

Types of Variability: Range

So what do we do with variability (both the good kind and the bad kind)? Our first step is to understand the different types of variability, which we do by looking at descriptive statistics.

Consider the range:

20

Part Two

Computing the Range

Computing The Range

Computing The Range

To find the range, simply subtract the lowest from the highest score in your distribution of scores

This is the simplest and least informative measure of spread

All of the scores between the two extremes (the highest and lowest scores) are virtually ignored, and thus this measure is very sensitive to extreme scores

Okay, you recall our officer data from Chapter 2 (Salkind), right? Letâ€™s return to that example â€¦

1). The Range

Spousal assault cases over twelve months for ten police officers who responded to the calls

What is the Range for the # Arrests

 # Arrests # Convictions S1 5 1 S2 9 6 S3 48 12 S4 62 12 S5 26 24 S6 26 1 S7 84 65 S8 5 4 S9 26 8 S10 8 2

The Range: Example Data

23

What is the Range for the # Arrests?

84 â€“ 5 = 79

 # Arrests # Convictions S1 5 1 S2 9 6 S3 48 12 S4 62 12 S5 26 24 S6 26 1 S7 84 65 S8 5 4 S9 26 8 S10 8 2

The Range for # Arrests?

What is the Range for the # Convictions?

65 â€“ 1 = 64

 # Arrests # Convictions S1 5 1 S2 9 6 S3 48 12 S4 62 12 S5 26 24 S6 26 1 S7 84 65 S8 5 4 S9 26 8 S10 8 2

The Range for # Convictions?

Pop-Quiz 2: Quiz Yourself

What is the range in this table?

A). 93

B). 83

C). 72

D). 55

E). 33

 x 23 67 98 15 48 26 19 22 58

What is the range in this table?

A). 93

B). 83 (98-15 = 83)

C). 72

D). 55

E). 33

 x 23 67 98 15 48 26 19 22 58

Problems with the Range

The range doesnâ€™t take into consideration the numbers falling between the two most extreme scores.

Consider the following graph. All three curves (black, blue, and red) have similar ranges (all reach out to around + or â€“ 5), but their distributions look very different

Problems with using just the range

-5 Range

+5 Range

Most scores close to the mean

29

The range thus gives us a general estimate of the differences in a data set, but it can be misleading.

Consider a slightly different officer data set â€¦

1). The Range

Spousal assault cases over twelve months for ten police officers who responded to the calls

What is the Range for the # Arrests (go to the next slide for the answer!)

 # Arrests # Convictions S1 5 1 S2 5 1 S3 5 1 S4 5 1 S5 5 1 S6 5 1 S7 84 65 S8 5 1 S9 5 1 S10 5 1

The Range:
A Second Example Dataset

31

What is the Range for the # Arrests

84 â€“ 5 = 79

 # Arrests # Convictions S1 5 1 S2 5 1 S3 5 1 S4 5 1 S5 5 1 S6 5 1 S7 84 65 S8 5 1 S9 5 1 S10 5 1

The Range for # Arrests
(example 2)

32

What is the Range for the # Convictions

65 â€“ 1 = 64

 # Arrests # Convictions S1 5 1 S2 5 1 S3 5 1 S4 5 1 S5 5 1 S6 5 1 S7 84 65 S8 5 1 S9 5 1 S10 5 1

The Range for
# convictions (Example 2)

33

How adequately does 79 arrests describe the data set when all but one officer has only 5 arrests? What about the convictions? Does 64 do an adequate job of describing the spread given that only officer has more than one conviction? Misleading, right!

Letâ€™s turn to a statistical test that better addresses the variability among data points: The Variance

Note: Weâ€™ll get to the standard deviation in Part Four

Part Three

Computing The Variance

Computing The Variance

Computing The Variance

Variance refers to a single number that represents the total amount of variation in a distribution

The nice thing about the variance is that it is the squared deviation (or distance) of scores from the mean

(Trust me, this will make more sense when we talk about the standard deviation in the next section)

For now, I want to focus on the variance calculation itself

Variance Formula

Ok, time for our second knuckle-whitening, sweat-inducing, heart-rate raising hard statistical formula

Relax! All we need to do is plug numbers into this formula â€¦

Variance Formula Components

Computing The Variance

Where:

S2 = The variance (our goal in figuring out this formula!)

âˆ‘ = The Greek â€œsum ofâ€ sign.

X = Each individual score

= The mean of all of the scores.

n = The sample size

 # Arrests 95 84 62 48 26 26 12 9 8 5 5 4

What is the Variance for the # Arrests?

Donâ€™t worry, I will walk you through this one!

First, letâ€™s copy our twelve scores into a new table (each of these scores is â€œXâ€)

What is the Variance for # Arrests?

 X (X-M) (X-M)2 95 84 62 48 26 26 12 9 8 5 5 4 âˆ‘ Mean?

Tip: you may

want to make a similar table to this each time you calculate the variance

Calculating the Variance: Step 1

 X (X-M) (X-M)2 95 84 62 48 26 26 12 9 8 5 5 4 âˆ‘ 384 / 12 = 32 Our mean (M, or ) is = 32 Next, subtract the M from EACH X (that is, X â€“ M)

Calculating the Variance: Step 2

 X (X-M) (X-M)2 95 63 84 62 48 26 26 12 9 8 5 5 4 âˆ‘ 384 / 12 = 32 M = 32

95 â€“ 32 =

Calculating the Variance: Step 3

 X (X-M) (X-M)2 95 63 84 52 62 48 26 26 12 9 8 5 5 4 âˆ‘ 384 / 12 = 32

Calculating the Variance: Step 4

 X (X-M) (X-M)2 95 63 84 52 62 30 48 26 26 12 9 8 5 5 4 âˆ‘ 384 / 12 = 32 M = 32

Calculating the Variance: Step 5

 X (X-M) (X-M)2 95 63 84 52 62 30 48 16 26 -6 26 -6 12 -20 9 -23 8 -24 5 -27 5 -27 4 -28 âˆ‘ 384 / 12 = 32 Want to see something cool? What happens when you add all of those (x-m) numbers?

Calculating the Variance: Step 6

 X (X-M) (X-M)2 95 63 84 52 62 30 48 16 26 -6 26 -6 12 -20 9 -23 8 -24 5 -27 5 -27 4 -28 âˆ‘ 384 / 12 = 32 ZERO! Yup, they equal zero. Itâ€™s a good way to make sure you did your (X â€“ M) correctly!

Calculating the Variance: Step 7

 X (X-M) (X-M)2 95 632 3969 84 52 62 30 48 16 26 -6 26 -6 12 -20 9 -23 8 -24 5 -27 5 -27 4 -28 âˆ‘ 384 / 12 = 32 Our next step is to square each (X â€“ M). That is, 63 X 63 = 3969 â€¦

=

Calculating the Variance: Step 8

 X (X-M) (X-M)2 95 632 3969 84 522 2704 62 30 48 16 26 -6 26 -6 12 -20 9 -23 8 -24 5 -27 5 -27 4 -28 âˆ‘ 384 / 12 = 32 52 X 52 = 2704, etc.

Calculating the Variance: Step 9

 X (X-M) (X-M)2 95 632 3969 84 522 2704 62 302 900 48 162 256 26 -62 36 26 -62 36 12 -202 400 9 -232 529 8 -242 576 5 -272 729 5 -272 729 4 -282 784 âˆ‘ 384 / 12 = 32 Add them! M = 32

Calculating the Variance: Step 10

 X (X-M) (X-M)2 95 63 3969 84 52 2704 62 30 900 48 16 256 26 -6 36 26 -6 36 12 -20 400 9 -23 529 8 -24 576 5 -27 729 5 -27 729 4 -28 784 âˆ‘ 384 / 12 = 32 11648 Okay, so our M = 32, our n â€“ 1 = 11 (based on 12 officers in the dataset), and (X â€“ M)2 = 11648

Calculating the Variance: Step 11

 X (X-M) (X-M)2 95 63 3969 84 52 2704 62 30 900 48 16 256 26 -6 36 26 -6 36 12 -20 400 9 -23 529 8 -24 576 5 -27 729 5 -27 729 4 -28 784 âˆ‘ 384 / 12 = 32 11648 So here is our formula again for variance: s2 = âˆ‘(X-M)2 / n â€“ 1=11648 / 11 = 1058.91

Interpreting the Variance

Okay, so our variance for the set of officer data is 1058.91. But what the heck does this mean?

Consider our range again for this data set. It was 95 â€“ 4 = 91

Given this range of arrests (remember that the most prolific arrester has 95 arrests while the lowest has only 4 arrests!), what does the 1058.91 variance really tell us?

We canâ€™t really add 1058.91 to our range of 91, or subtract it from our range. It doesnâ€™t make much sense, right? So â€¦

What does the Variance Mean?

â€¦ what does our variance of 1058.91 really mean?

Unfortunately, not very much right now, as the variance is not expressed in the same units (the same arrest numbers) as the original data set.

Thatâ€™s why we must take another step to understand the measure of spread : We compute the standard deviation. Weâ€™ll get to that in a second, but for now a Pop Quiz â€¦

Pop-Quiz 3: Quiz Yourself

Which of the following is one way to represent the variance?

A). s

B). s2

C). s(2)

D). s / n

Which of the following is one way to represent the variance?

A). s

B). s2

C). s(2)

D). s / n

What does S2 mean?

Think about that last pop quiz question. We have s2. Well, what happens if we take the square root of s2?

Excellent question (if I do say so myself)! Letâ€™s find out â€¦

Part Four

Computing The Standard Deviation

The Standard Deviation

The standard deviation is exactly what it sounds like: itâ€™s the deviation from something that is standard!

Youâ€™ll often see the standard deviation expressed as s or SD

The larger the standard deviation, the more distance the data points in the distribution are from the mean

On the next slide, Iâ€™ll show you the formula for both the SD and the variance. Can you tell me what the difference is?

Standard Deviation vs Variance

How do these formulas differ?

Variance is squared while the standard deviation is not!

If you calculate the variance first, then just take the square root and youâ€™ll get the standard deviation!

Standard Deviation

Variance

Computing The Standard Deviation

Variance is squared while the standard deviation is not!

Okay, back to our officer data. Recall the 1058.91 variance

Pop-Quiz 4: Quiz Yourself

What is the square root of 1058.91?

A). 28.23

B). 31.15

C). 32.54

D). 47.99

E). 970.66

What is the square root of 1058.91?

A). 28.23

B). 31.15

C). 32.54

D). 47.99

E). 970.66

SD is the Square Root of the Variance

The standard deviation is the square root of the variance

Unlike variance, the standard deviation is expressed in the same units as original numbers (and is thus more useful)

Think about 32.54 in relation to our officer data â€¦

 # Arrests 95 84 62 48 26 26 12 9 8 5 5 4

Mean = 32

SD = 32.54

We have a lot variability here, with only four officers above the mean. The larger the SD, the more variability (here, given a mean of 32, an SD of 32.54 shows a lot of variability!)

So letâ€™s consider a less variable data set ..

Interpreting the SD: Example

Mean = 31.83

SD = 3.13

Range = 36 â€“ 26 = 10

This data set has a very similar mean as our prior data, but a very different SD (3.13, compared to 32.54)

You can also infer just by comparing the numbers here that this data set is much closer together (less spread)!

Interpreting the SD: Example 2

 # Arrests 26 27 30 30 31 32 33 34 34 34 35 36

I actually encourage you to take this new data set and calculate the SD yourself. See if you can replicate my 3.13

 # Arrests 26 27 30 30 31 32 33 34 34 34 35 36

Steps In Computing The Standard Deviation

1. List the scores (this can be in any order)

2. Compute the mean for each group

3. Subtract the mean from each score

4. Square each individual difference (subtracted) score

5. Sum (add) the squared deviations

6. Divide by n â€“ 1 This is the variance

7. Take the square root This is the standard deviation

NOTE: The variance is always bigger than the SD

Why Is the Standard Deviation Important?

Now that you know HOW to compute the variance and the standard deviation, it is important to understand WHY we compute them at all.

It all comes down to the normal curve. Weâ€™ve discussed this a bit already, but strap in for a more in-depth analysis!

The Normal Distribution

The standard deviation is very important when it comes to the normal curve (or normal distribution, or â€œbell curveâ€)

In a normal curve, the mean, median, and mode are all in the center, and the left side of the curve is the mirrored equivalent of the right side

More on this later in the semester!

We will discuss normal curves a lot more in later chapters, but for now it is important to know that a lot of the statistics we discuss assume that the scores we are looking at are normally distributed.

Because a lot of statistics rely on the mean, a skewed mean (one that has outliers) violates the statistical assumptions.

How does this tie in to the SD? Well, the SD uses the mean in itâ€™s formula, so if the mean is â€œskewedâ€, so is the SD

Another Quick Example

You and your friends have just measured the heights of your dogs (in millimeters from their paws to shoulders). This is what you got:

The heights (at the shoulders) are: 600mm, 470mm, 170mm, 430mm and 300mm

What is the Mean?

mm

Whatâ€™s the mean for dog height?

The mean is 394 mm. Thatâ€™s the easy part!

600 + 470 + 170 + 430 + 300 = 1970 Ã· 5 = 394

mm

Plotting the Differences

Now plot the difference between each dog and the mean score

600 â€“ 394 = 206

470 â€“ 394 = 76

Etc.

mm

 Subject # X (X-M) (X-M)2 S1 600 206 42436 S2 470 76 5776 S3 170 -224 50176 S4 430 36 1296 S5 300 -94 8836 âˆ‘ 1970 / 5 = 394 108,520 M = 394, s2 = 108,520 / n â€“ 1

Calculating the Variance Dogâ€™s Height: Step 1

 Subject # X (X-M) (X-M)2 S1 600 206 42436 S2 470 76 5776 S3 170 -224 50176 S4 430 36 1296 S5 300 -94 8836 âˆ‘ 1970 / 5 = 394 108,520 M = 394, s2 = 108,520 / 4 = 27,130 (our variance)

Calculating the Variance Dogâ€™s Height: Step 2

Standard Deviation for the Dog Example

The variance is what?

27,130

The standard deviation is the square root of variance, so:

The standard deviation is what?

(or just 165 rounded)

Interpreting the SD

The good thing about the standard deviation is that it is useful. We can show which heights are within 1 standard deviation (165 mm) of the mean:

Using the standard deviation we have a “standard” way of knowing what is normal for a dog, and what is extra large or extra small

– 1 SD {

+1 SD {

Mean = 394

77

Why n â€“ 1? Whatâ€™s Wrong With Just n?

In our standard deviation and variance formulas, we can use either n or n â€“ 1. Why would we ever subtract 1?

As social scientists, psychologists feel it is better to error on the side of caution, and thus we tend to be conservative

If we must error, error in favor of having too much variability. I used n â€“ 1 in my dog example for this reason. But â€¦

Samples vs the Population

It also comes down to the sample versus the population.

If you have a whole population at your disposal (unlikely, but possible), you can use n (use the whole population!)

But if you draw a sample as I did with my dogs, it is best to go with n â€“ 1, as the sample is merely an estimate of the population (and we want to be conservative in our estimate). This gives us an unbiased estimate

Using n vs n-1 in SD calculations

The larger your sample, the less difference there is between the biased estimate (n) and the unbiased estimate (n â€“ 1).

 Sample Size Value of Numerator in SD Biased Estimate of the Population (n) Unbiased Estimate of the Population SD (n â€“ 1) Difference Between Biased and Unbiased 10 500 7.07 7.45 0.38 100 500 2.24 2.25 0.01 1,000 500 0.7071 0.7075 0.0004

80

Things to Remember about the SD

1. When using the standard deviation, donâ€™t think about the mode and median. The SD only works with the mean

2. The larger the standard deviation, the more spread

3. Like the mean, the standard deviation is very sensitive to outliers (extreme scores)

4. If the standard deviation is zero, this indicates that there is NO variation at all (this is very rare)

Pop-Quiz 5: Quiz Yourself

Do you want the standard deviation to be a big number or a small number?

A). Big

B). Small

C). It doesnâ€™t really matter

D). I havenâ€™t got a clue!