Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Statistics 528 Lecture 3: Graphical and Numerical Summaries of Data, Study notes of Statistics

A transcript from a university statistics lecture focusing on graphical and numerical summaries of data. Topics include identifying the shape of a distribution through modes, symmetry, and outliers, as well as measures of the center (mean and median) and measures of spread (iqr and standard deviation). The lecture also covers choosing appropriate summaries and changing the units of measurement.

What you will learn

  • What are the different types of modes in a distribution?
  • How does symmetry differ from skewness in a distribution?
  • What are outliers and how can they affect measures of the center and spread?

Typology: Study notes

2021/2022

Uploaded on 09/27/2022

johnatan
johnatan 🇺🇸

4

(29)

280 documents

1 / 4

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Statistics 528 - Lecture 3
1
Statistics 528 - Lecture 3
Prof. Kate Calder
1
Section 1.1/1.2
Graphical and Numerical Summaries of Data
Shape of a Distribution
Modes
Symmetric vs. Skewed
Outliers
Measures of the Center
mean
median
Measures of Spread
IQR
standard deviation
Choosing Summaries of Distributions
Changing the Units of Measurement
Statistics 528 - Lecture 3
Prof. Kate Calder
2
Modes
Question: Does the distribution have one o r several major peaks?
Look at histograms and stemplots.
A distribution with one major peak is called unimoda l. A
distribution with two major peaks is called bimodal.
Example of a bimodal distribution: scores on an exam
Statistics 528 - Lecture 3
Prof. Kate Calder
3
Symmetric vs. Skewed
A distribution is symmetric if the values larger or smaller than the
midpoint are mirror images of each other.
A distribution is skewed to the right if the right tail (lar ger values) is
much longer than the left tail (smaller values).
A distribution is skewed to the left if the left tail (smaller values) is
much longer than the right tail (larger values).
Statistics 528 - Lecture 3
Prof. Kate Calder
4
Left Skewed Symmetric Right Skewed
Statistics 528 - Lecture 3
Prof. Kate Calder
5 Statistics 528 - Lecture 3
Prof. Kate Calder
6
Outliers
Outliers values that fall outside the overall pattern an d are far from
the bulk of the data
Can be a result of natural variation.
Or, can be evidence of a mistake (equipment failu re, incorrect
recording of an observation, etc.).
Removing an outlier? Big Decision
pf3
pf4

Partial preview of the text

Download Statistics 528 Lecture 3: Graphical and Numerical Summaries of Data and more Study notes Statistics in PDF only on Docsity!

Statistics 528 - Lecture 3 Prof. Kate Calder

1

Section 1.1/1.

Graphical and Numerical Summaries of Data

  • Shape of a Distribution
    • Modes
    • Symmetric vs. Skewed
    • Outliers
  • Measures of the Center
    • mean
    • median
  • Measures of Spread
    • IQR
    • standard deviation
  • Choosing Summaries of Distributions
  • Changing the Units of Measurement

Statistics 528 - Lecture 3 Prof. Kate Calder

2

Modes

  • Question: Does the distribution have one or several major peaks? Look at histograms and stemplots.
  • A distribution with one major peak is called unimodal. A distribution with two major peaks is called bimodal.
  • Example of a bimodal distribution: scores on an exam

Statistics 528 - Lecture 3 Prof. Kate Calder

3

Symmetric vs. Skewed

  • A distribution is symmetric if the values larger or smaller than the midpoint are mirror images of each other.
  • A distribution is skewed to the right if the right tail (larger values) is much longer than the left tail (smaller values).
  • A distribution is skewed to the left if the left tail (smaller values) is much longer than the right tail (larger values).

Statistics 528 - Lecture 3 Prof. Kate Calder

4

Left Skewed Symmetric Right Skewed

Statistics 528 - Lecture 3 Prof. Kate Calder

5 Statistics 528 - Lecture 3 Prof. Kate Calder

6

Outliers

Outliers – values that fall outside the overall pattern and are far from the bulk of the data

  • Can be a result of natural variation.
  • Or, can be evidence of a mistake (equipment failure, incorrect recording of an observation, etc.).

Removing an outlier? Big Decision

Statistics 528 - Lecture 3 Prof. Kate Calder

7

Measures of the Center

Two different ideas for the “center” of a distribution - can be very different.

  • Mean - “ average value

or,

n

x x x

x

+ + + n

n

i

xi

n

x

Statistics 528 - Lecture 3 Prof. Kate Calder

8

  • Median - “middle value”

a) sort observations from smallest to largest b) if n is odd ( n = number of observations) median = middle value of the sorted list = (n+1)/2th^ observation up from the bottom of the list c) if n is even median = mean of the middle two observations

Statistics 528 - Lecture 3 Prof. Kate Calder

9

Mean vs. Median

  • The median is a more resistant measure of the center of a distribution,

i.e., the median is not as affected by extreme observations (long tails, outliers)

Mean vs. Median Applet - example of a dot plot (http://bcs.whfreeman.com/ips4e/default.asp)

Statistics 528 - Lecture 3 Prof. Kate Calder

10

Left Skewed Symmetric Right Skewed

Mean < Median Mean = Median Mean > Median

Statistics 528 - Lecture 3 Prof. Kate Calder

11

Example: Phyllis received 6 HW grades in her statistics class:

86 88 92 44 89 90

Her mean grade is:

Her median grade is:

44 86 88 89 90 92

Statistics 528 - Lecture 3 Prof. Kate Calder

12

Question: Does the mean, 81.5, give a good idea of her “typical” grade?

No, it is lower than all but one of her grades.

Question: What about the median, 88.5?

88.5 is more typical.

Statistics 528 - Lecture 3 Prof. Kate Calder

19

Choosing a Summary

  • The median, IQR, or five-number summary are better than the mean and the standard deviation for describing a skewed distribution or a distribution with outliers.
  • The mean and standard deviation should only be used for describing symmetric distributions with no outliers.
  • Why should we ever used the mean and standard deviation?

Answer: They completely specify a normal distribution which allows us to easily perform statistical inference.

Statistics 528 - Lecture 3 Prof. Kate Calder

20

Changing the Unit of Measurement

Linear Transformations : xnew = a + b x

  • a (constant) shifts all of the values of x up or down by the same amount
  • b (positive constant) changes the size of the unit of measurement
  • A linear transformation will not change the shape of a distribution.
  • Multiplying each observation by a positive constant b multiplies both measures of the center (mean and median) and measures of spread (IQR and standard deviation) by b.
  • Adding the same number a (either positive or negative) to each observation adds a to the measures of the center (mean and median) and to the quartiles (and other percentiles) but does not change measures of spread.