Chat with us, powered by LiveChat hom | excelpaper.org/
+1(978)310-4246 credencewriters@gmail.com
  

Chapter Three (Salkind)

Vive la Difference: Understanding Variability

An Overview of This Chapter

Descriptive Statistics

When we collect data, we have to organize that information in a way that makes it … informative!

As we saw in first chapter of Salkind (“Statistics or Sadistics”), descriptive statistics are helping us to “describe” data.

In chapter two, we discussed one specific type of descriptive statistics – measures of central tendency (the mean, median, and mode). You recall those, right?

2

Descriptive Statistics

Descriptive Statistics

Measures of central tendency (mean, median, and mode) are really helpful in describing a single number involving a central score in the data set, but they are not the full picture. Here we move onto another measure – “measures of spread”

Fancy phrase, huh? Don’t fret yet, as it’s actually pretty easy to understand

Measures of spread, like measures of central tendency, are descriptive statistics that try to find a single number that best describes the variability in a data set

3

In this chapter we cover the following items …

Part One: Why Understanding Variability Is Important

Part Two: Computing the Range

Part Three: Computing the Variance

Part Four: Computing the Standard Deviation

Part Five: Using the Computer To Compute Variability

Part Six: An Eye Toward The Future

4

Part One

Why Understanding Variability Is Important

Why Variability Is Important

Why Understanding Variability Is Important

In Chapter 2 (Salkind), we discussed three measures of central tendency, or finding a single number that describes the central nature of a distribution (mean, median, and mode)

Here, we will discuss a single number that describes just how variable the data set is using (again) three factors: the range, the variance, and the standard deviation

Before detailing these, consider variability in general …

6

What is the Mean?

Variability refers to how scores differ from on another. Consider the following scores:

Set A: 5, 8, 20, 23, 44

Set B: 17, 19, 20, 21, 23

What is the mean for Set A? 20

What is the mean for Set B? 20

7

What is the Median?

Why Understanding Variability Is Important

Variability refers to how scores differ from on another. Consider the following scores:

Set A: 5, 8, 20, 23, 44

Set B: 17, 19, 20, 21, 23

What is the median for Set A? 20

What is the median for Set B? 20

8

What is the Mode?

Why Understanding Variability Is Important

Variability refers to how scores differ from on another. Consider the following scores:

Set A: 5, 8, 20, 23, 44

Set B: 17, 19, 20, 21, 23

What is the mode for Set A? Actually, there are 5 modes

What is the mode for Set B? Same here – 5 modes

9

Which set has more variability?

Variability refers to how scores differ from on another. Consider the following scores:

Set A: 5, 8, 20, 23, 44

Set B: 17, 19, 20, 21, 23

But which set is more variable?

Set A ranges from 5 to 44. That’s quite a lot of variability

Set B ranges from 17 to 23. Still a little variability, but not nearly as much as set A.

10

Variability is Important

If we simply went with our measure of central tendency, we could conclude that the two sets are identical, but the fact that one set (Set A) has a much larger spread gives us another really important descriptive statistic to consider

11

Understanding Variability

Variability thus becomes a measure of how much each score in a group of scores differs from the mean

You might see variability expressed in different terms (such as fluctuation, liability, or error), but they all mean essentially the same thing: variability refers to the spread of scores

So why is variability important?

12

Understanding Variability: An Example

We’ve already talked about how experiments compare two or more groups. Consider a simple control group vs. experimental group design.

If we give the experimental group some treatment (let’s say we give them a drug to reduce social anxiety), we can look at how our experimental and control participants respond in a subsequent social situation

Our hope, of course, is that those in the experimental group will have less anxiety than those in the control group

13

Understanding Variability: An Example (2)

We expect variability between the experimental group and the control group. The set of social anxiety scores in the experimental group should be higher than the social anxiety scores for the control group. This is GOOD variability

But here is where variability gets trickier: Within the same group, there is error variability, something we discussed in Smith and Davis (Extraneous variables: Chapter 7)

14

Error Variability

Error is variability unrelated to your independent variable.

This variability can come from participant demographic characteristics, attitudes, different childhood experiences

It can come from uncontrolled factors in the experiment itself (computer freezes during a study, an alarm rings, a researcher laughs at something she shouldn’t)

15

Between-Group Variability (good)

Although researchers attempt to control variability as much as possible, error variability often gets into the design

If variability between conditions is high AND it is related to the independent variable, then this is good! We want our experimental group to vary from the control group

16

Within-Group Variability (bad)

Although researchers attempt to control variability as much as possible, error variability often gets into the design

But if variability within the same condition is high, this is bad. It means that group is highly variable regardless of the experimental manipulations.

It is also bad if there is variability between groups that is not related to the IV. This is our extraneous variable

17

Pop-Quiz 1: Quiz Yourself

In research studies, we want ____________ group variability to be high and ____________ group variability to be low

A). Within; Between

B). Between: Within

C). Between; Between

D). Within; Within

E). None of the above

Answer 1: B

In research studies, we want ____________ group variability to be high and ____________ group variability to be low

A). Within; Between

B). Between: Within

C). Between; Between

D). Within; Within

E). None of the above

Types of Variability: Range

So what do we do with variability (both the good kind and the bad kind)? Our first step is to understand the different types of variability, which we do by looking at descriptive statistics.

Consider the range:

20

Part Two

Computing the Range

Computing The Range

Computing The Range

To find the range, simply subtract the lowest from the highest score in your distribution of scores

This is the simplest and least informative measure of spread

All of the scores between the two extremes (the highest and lowest scores) are virtually ignored, and thus this measure is very sensitive to extreme scores

Okay, you recall our officer data from Chapter 2 (Salkind), right? Let’s return to that example …

1). The Range

Spousal assault cases over twelve months for ten police officers who responded to the calls

What is the Range for the # Arrests

# Arrests # Convictions
S1 5 1
S2 9 6
S3 48 12
S4 62 12
S5 26 24
S6 26 1
S7 84 65
S8 5 4
S9 26 8
S10 8 2

The Range: Example Data

23

What is the Range for the # Arrests?

84 – 5 = 79

# Arrests # Convictions
S1 5 1
S2 9 6
S3 48 12
S4 62 12
S5 26 24
S6 26 1
S7 84 65
S8 5 4
S9 26 8
S10 8 2

The Range for # Arrests?

What is the Range for the # Convictions?

65 – 1 = 64

# Arrests # Convictions
S1 5 1
S2 9 6
S3 48 12
S4 62 12
S5 26 24
S6 26 1
S7 84 65
S8 5 4
S9 26 8
S10 8 2

The Range for # Convictions?

Pop-Quiz 2: Quiz Yourself

What is the range in this table?

A). 93

B). 83

C). 72

D). 55

E). 33

x
23
67
98
15
48
26
19
22
58

Answer 2: B

What is the range in this table?

A). 93

B). 83 (98-15 = 83)

C). 72

D). 55

E). 33

x
23
67
98
15
48
26
19
22
58

Problems with the Range

The range doesn’t take into consideration the numbers falling between the two most extreme scores.

Consider the following graph. All three curves (black, blue, and red) have similar ranges (all reach out to around + or – 5), but their distributions look very different

Problems with using just the range

-5 Range

+5 Range

Most scores close to the mean

Scores more spread out

Scores most spread out

29

The Range can be Misleading

The range thus gives us a general estimate of the differences in a data set, but it can be misleading.

Consider a slightly different officer data set …

1). The Range

Spousal assault cases over twelve months for ten police officers who responded to the calls

What is the Range for the # Arrests (go to the next slide for the answer!)

# Arrests # Convictions
S1 5 1
S2 5 1
S3 5 1
S4 5 1
S5 5 1
S6 5 1
S7 84 65
S8 5 1
S9 5 1
S10 5 1

The Range:
A Second Example Dataset

31

What is the Range for the # Arrests

84 – 5 = 79

# Arrests # Convictions
S1 5 1
S2 5 1
S3 5 1
S4 5 1
S5 5 1
S6 5 1
S7 84 65
S8 5 1
S9 5 1
S10 5 1

The Range for # Arrests
(example 2)

32

What is the Range for the # Convictions

65 – 1 = 64

# Arrests # Convictions
S1 5 1
S2 5 1
S3 5 1
S4 5 1
S5 5 1
S6 5 1
S7 84 65
S8 5 1
S9 5 1
S10 5 1

The Range for
# convictions (Example 2)

33

Is the Range Adequate?

How adequately does 79 arrests describe the data set when all but one officer has only 5 arrests? What about the convictions? Does 64 do an adequate job of describing the spread given that only officer has more than one conviction? Misleading, right!

Let’s turn to a statistical test that better addresses the variability among data points: The Variance

Note: We’ll get to the standard deviation in Part Four

Part Three

Computing The Variance

Computing The Variance

Computing The Variance

Variance refers to a single number that represents the total amount of variation in a distribution

The nice thing about the variance is that it is the squared deviation (or distance) of scores from the mean

(Trust me, this will make more sense when we talk about the standard deviation in the next section)

For now, I want to focus on the variance calculation itself

Variance Formula

Ok, time for our second knuckle-whitening, sweat-inducing, heart-rate raising hard statistical formula

Relax! All we need to do is plug numbers into this formula …

Variance Formula Components

Computing The Variance

Where:

S2 = The variance (our goal in figuring out this formula!)

∑ = The Greek “sum of” sign.

X = Each individual score

= The mean of all of the scores.

n = The sample size

# Arrests
95
84
62
48
26
26
12
9
8
5
5
4

What is the Variance for the # Arrests?

Don’t worry, I will walk you through this one!

First, let’s copy our twelve scores into a new table (each of these scores is “X”)

What is the Variance for # Arrests?

X (X-M) (X-M)2
95
84
62
48
26
26
12
9
8
5
5
4
∑ Mean?

Tip: you may

want to make a similar table to this each time you calculate the variance

Calculating the Variance: Step 1

X (X-M) (X-M)2
95
84
62
48
26
26
12
9
8
5
5
4
∑ 384 / 12 = 32
Our mean (M, or ) is = 32 Next, subtract the M from EACH X (that is, X – M)

Calculating the Variance: Step 2

X (X-M) (X-M)2
95 63
84
62
48
26
26
12
9
8
5
5
4
∑ 384 / 12 = 32
M = 32

95 – 32 =

Calculating the Variance: Step 3

X (X-M) (X-M)2
95 63
84 52
62
48
26
26
12
9
8
5
5
4
∑ 384 / 12 = 32

Calculating the Variance: Step 4

X (X-M) (X-M)2
95 63
84 52
62 30
48
26
26
12
9
8
5
5
4
∑ 384 / 12 = 32
M = 32

Calculating the Variance: Step 5

X (X-M) (X-M)2
95 63
84 52
62 30
48 16
26 -6
26 -6
12 -20
9 -23
8 -24
5 -27
5 -27
4 -28
∑ 384 / 12 = 32
Want to see something cool? What happens when you add all of those (x-m) numbers?

Calculating the Variance: Step 6

X (X-M) (X-M)2
95 63
84 52
62 30
48 16
26 -6
26 -6
12 -20
9 -23
8 -24
5 -27
5 -27
4 -28
∑ 384 / 12 = 32 ZERO!
Yup, they equal zero. It’s a good way to make sure you did your (X – M) correctly!

Calculating the Variance: Step 7

X (X-M) (X-M)2
95 632 3969
84 52
62 30
48 16
26 -6
26 -6
12 -20
9 -23
8 -24
5 -27
5 -27
4 -28
∑ 384 / 12 = 32
Our next step is to square each (X – M). That is, 63 X 63 = 3969 …

=

Calculating the Variance: Step 8

X (X-M) (X-M)2
95 632 3969
84 522 2704
62 30
48 16
26 -6
26 -6
12 -20
9 -23
8 -24
5 -27
5 -27
4 -28
∑ 384 / 12 = 32
52 X 52 = 2704, etc.

Calculating the Variance: Step 9

X (X-M) (X-M)2
95 632 3969
84 522 2704
62 302 900
48 162 256
26 -62 36
26 -62 36
12 -202 400
9 -232 529
8 -242 576
5 -272 729
5 -272 729
4 -282 784
∑ 384 / 12 = 32 Add them!
M = 32

Calculating the Variance: Step 10

X (X-M) (X-M)2
95 63 3969
84 52 2704
62 30 900
48 16 256
26 -6 36
26 -6 36
12 -20 400
9 -23 529
8 -24 576
5 -27 729
5 -27 729
4 -28 784
∑ 384 / 12 = 32 11648
Okay, so our M = 32, our n – 1 = 11 (based on 12 officers in the dataset), and (X – M)2 = 11648

Calculating the Variance: Step 11

X (X-M) (X-M)2
95 63 3969
84 52 2704
62 30 900
48 16 256
26 -6 36
26 -6 36
12 -20 400
9 -23 529
8 -24 576
5 -27 729
5 -27 729
4 -28 784
∑ 384 / 12 = 32 11648
So here is our formula again for variance: s2 = ∑(X-M)2 / n – 1=11648 / 11 = 1058.91

Interpreting the Variance

Okay, so our variance for the set of officer data is 1058.91. But what the heck does this mean?

Consider our range again for this data set. It was 95 – 4 = 91

Given this range of arrests (remember that the most prolific arrester has 95 arrests while the lowest has only 4 arrests!), what does the 1058.91 variance really tell us?

We can’t really add 1058.91 to our range of 91, or subtract it from our range. It doesn’t make much sense, right? So …

What does the Variance Mean?

… what does our variance of 1058.91 really mean?

Unfortunately, not very much right now, as the variance is not expressed in the same units (the same arrest numbers) as the original data set.

That’s why we must take another step to understand the measure of spread : We compute the standard deviation. We’ll get to that in a second, but for now a Pop Quiz …

Pop-Quiz 3: Quiz Yourself

Which of the following is one way to represent the variance?

A). s

B). s2

C). s(2)

D). s / n

Answer 3: B

Which of the following is one way to represent the variance?

A). s

B). s2

C). s(2)

D). s / n

What does S2 mean?

Think about that last pop quiz question. We have s2. Well, what happens if we take the square root of s2?

Excellent question (if I do say so myself)! Let’s find out …

Part Four

Computing The Standard Deviation

The Standard Deviation

The standard deviation is exactly what it sounds like: it’s the deviation from something that is standard!

You’ll often see the standard deviation expressed as s or SD

The larger the standard deviation, the more distance the data points in the distribution are from the mean

On the next slide, I’ll show you the formula for both the SD and the variance. Can you tell me what the difference is?

Standard Deviation vs Variance

How do these formulas differ?

Variance is squared while the standard deviation is not!

If you calculate the variance first, then just take the square root and you’ll get the standard deviation!

Standard Deviation

Variance

Computing The Standard Deviation

Variance is squared while the standard deviation is not!

Okay, back to our officer data. Recall the 1058.91 variance

Pop-Quiz 4: Quiz Yourself

What is the square root of 1058.91?

A). 28.23

B). 31.15

C). 32.54

D). 47.99

E). 970.66

Answer 4: C

What is the square root of 1058.91?

A). 28.23

B). 31.15

C). 32.54

D). 47.99

E). 970.66

SD is the Square Root of the Variance

The standard deviation is the square root of the variance

Unlike variance, the standard deviation is expressed in the same units as original numbers (and is thus more useful)

Think about 32.54 in relation to our officer data …

# Arrests
95
84
62
48
26
26
12
9
8
5
5
4

Mean = 32

SD = 32.54

We have a lot variability here, with only four officers above the mean. The larger the SD, the more variability (here, given a mean of 32, an SD of 32.54 shows a lot of variability!)

So let’s consider a less variable data set ..

Interpreting the SD: Example

Mean = 31.83

SD = 3.13

Range = 36 – 26 = 10

This data set has a very similar mean as our prior data, but a very different SD (3.13, compared to 32.54)

You can also infer just by comparing the numbers here that this data set is much closer together (less spread)!

Interpreting the SD: Example 2

# Arrests
26
27
30
30
31
32
33
34
34
34
35
36

I actually encourage you to take this new data set and calculate the SD yourself. See if you can replicate my 3.13

Try it on your own!

# Arrests
26
27
30
30
31
32
33
34
34
34
35
36

Steps In Computing The Standard Deviation

1. List the scores (this can be in any order)

2. Compute the mean for each group

3. Subtract the mean from each score

4. Square each individual difference (subtracted) score

5. Sum (add) the squared deviations

6. Divide by n – 1 This is the variance

7. Take the square root This is the standard deviation

NOTE: The variance is always bigger than the SD

Why Is the Standard Deviation Important?

Now that you know HOW to compute the variance and the standard deviation, it is important to understand WHY we compute them at all.

It all comes down to the normal curve. We’ve discussed this a bit already, but strap in for a more in-depth analysis!

The Normal Distribution

The standard deviation is very important when it comes to the normal curve (or normal distribution, or “bell curve”)

In a normal curve, the mean, median, and mode are all in the center, and the left side of the curve is the mirrored equivalent of the right side

More on this later in the semester!

We will discuss normal curves a lot more in later chapters, but for now it is important to know that a lot of the statistics we discuss assume that the scores we are looking at are normally distributed.

Because a lot of statistics rely on the mean, a skewed mean (one that has outliers) violates the statistical assumptions.

How does this tie in to the SD? Well, the SD uses the mean in it’s formula, so if the mean is “skewed”, so is the SD

Another Quick Example

You and your friends have just measured the heights of your dogs (in millimeters from their paws to shoulders). This is what you got:

The heights (at the shoulders) are: 600mm, 470mm, 170mm, 430mm and 300mm

What is the Mean?

mm

What’s the mean for dog height?

The mean is 394 mm. That’s the easy part!

600 + 470 + 170 + 430 + 300 = 1970 ÷ 5 = 394

mm

Plotting the Differences

Now plot the difference between each dog and the mean score

600 – 394 = 206

470 – 394 = 76

Etc.

mm

Subject # X (X-M) (X-M)2
S1 600 206 42436
S2 470 76 5776
S3 170 -224 50176
S4 430 36 1296
S5 300 -94 8836
∑ 1970 / 5 = 394 108,520
M = 394, s2 = 108,520 / n – 1

Calculating the Variance Dog’s Height: Step 1

Subject # X (X-M) (X-M)2
S1 600 206 42436
S2 470 76 5776
S3 170 -224 50176
S4 430 36 1296
S5 300 -94 8836
∑ 1970 / 5 = 394 108,520
M = 394, s2 = 108,520 / 4 = 27,130 (our variance)

Calculating the Variance Dog’s Height: Step 2

Standard Deviation for the Dog Example

The variance is what?

27,130

The standard deviation is the square root of variance, so:

The standard deviation is what?

(or just 165 rounded)

Interpreting the SD

The good thing about the standard deviation is that it is useful. We can show which heights are within 1 standard deviation (165 mm) of the mean:

Using the standard deviation we have a “standard” way of knowing what is normal for a dog, and what is extra large or extra small

– 1 SD {

+1 SD {

Mean = 394

77

Why n – 1? What’s Wrong With Just n?

In our standard deviation and variance formulas, we can use either n or n – 1. Why would we ever subtract 1?

As social scientists, psychologists feel it is better to error on the side of caution, and thus we tend to be conservative

If we must error, error in favor of having too much variability. I used n – 1 in my dog example for this reason. But …

Samples vs the Population

It also comes down to the sample versus the population.

If you have a whole population at your disposal (unlikely, but possible), you can use n (use the whole population!)

But if you draw a sample as I did with my dogs, it is best to go with n – 1, as the sample is merely an estimate of the population (and we want to be conservative in our estimate). This gives us an unbiased estimate

Using n vs n-1 in SD calculations

The larger your sample, the less difference there is between the biased estimate (n) and the unbiased estimate (n – 1).

Sample Size Value of Numerator in SD Biased Estimate of the Population
(n)
Unbiased Estimate of the Population SD (n – 1) Difference Between Biased and Unbiased
10 500 7.07 7.45 0.38
100 500 2.24 2.25 0.01
1,000 500 0.7071 0.7075 0.0004

80

Things to Remember about the SD

1. When using the standard deviation, don’t think about the mode and median. The SD only works with the mean

2. The larger the standard deviation, the more spread

3. Like the mean, the standard deviation is very sensitive to outliers (extreme scores)

4. If the standard deviation is zero, this indicates that there is NO variation at all (this is very rare)

Pop-Quiz 5: Quiz Yourself

Do you want the standard deviation to be a big number or a small number?

A). Big

B). Small

C). It doesn’t really matter

D). I haven’t got a clue!

Answer 5: B

Do you want the standard deviation to be a big number or a small number?

A). Big

B). Small – The smaller the number, the less “bad” variation

C). It doesn’t really matter

D). I haven’t got a clue!

error: Content is protected !!