Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Understanding Hypothesis Testing: Null & Alternative Hypotheses, Errors, Significance - Pr, Exams of Data Analysis & Statistical Methods

An introduction to hypothesis testing, focusing on the concepts of null and alternative hypotheses, types of errors, and significance levels. The text uses the example of testing the fairness of a coin to illustrate these concepts. It also discusses the importance of setting a level of significance and calculating a test statistic to make a conclusion.

Typology: Exams

Pre 2010

Uploaded on 08/09/2009

koofers-user-0bm
koofers-user-0bm ๐Ÿ‡บ๐Ÿ‡ธ

10 documents

1 / 8

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
1 | P a g e
STA100
Lecture19
Text: Section 9.1, 9.2
Hypothesis Tests
We test hypotheses and draw inferences all the time. Suppose you want to buy a used car. At the risk
of oversimplifying, you might buy a โ€œlemonโ€ or you might buy a โ€œterrificโ€ car. Unless you are a
mechanic you probably donโ€™t really know which type the car is until after you are done weighing your
decision and have spent thousands of dollars..
If you think about it, there are four possibilities. Buy a lemon, buy a great car, pass up a lemon, pass
up a great car. You might try to โ€œplay the averagesโ€ buy consulting Kelly Blue Book or some other
sight to read reviews, etc, but this doesnโ€™t really tell you about the car in front of you.
It is much the same thing in a court case. If you are on a jury you must decide whether the person
who has been accused actually did the crime or not. Even if they confess youโ€™ll never really know
whether they are guilty, but you still must make a decision.
Hereโ€™s a table summarizing the situation:
The accused actually
did not
do the crime
The accused actually
did
do the crime
You find them guilty
Incorrect choice
Correct choice
You find them not guilty
Correct choice
Incorrect choice
As you can see there are two types of errors possible here: sending an innocent person to jail and
letting a guilty person go free. No court system can possibly be error free all the time. Given that the
world is imperfect, how do we proceed intelligently?
pf3
pf4
pf5
pf8

Partial preview of the text

Download Understanding Hypothesis Testing: Null & Alternative Hypotheses, Errors, Significance - Pr and more Exams Data Analysis & Statistical Methods in PDF only on Docsity!

STA100 Lecture

Text: Section 9.1, 9.

Hypothesis Tests We test hypotheses and draw inferences all the time. Suppose you want to buy a used car. At the risk of oversimplifying, you might buy a โ€œlemonโ€ or you might buy a โ€œterrificโ€ car. Unless you are a mechanic you probably donโ€™t really know which type the car is until after you are done weighing your decision and have spent thousands of dollars..

If you think about it, there are four possibilities. Buy a lemon, buy a great car, pass up a lemon, pass up a great car. You might try to โ€œplay the averagesโ€ buy consulting Kelly Blue Book or some other sight to read reviews, etc, but this doesnโ€™t really tell you about the car in front of you.

It is much the same thing in a court case. If you are on a jury you must decide whether the person who has been accused actually did the crime or not. Even if they confess youโ€™ll never really know whether they are guilty, but you still must make a decision.

Hereโ€™s a table summarizing the situation:

The accused actually did not do the crime

The accused actually did do the crime You find them guilty Incorrect choice Correct choice

You find them not guilty Correct choice Incorrect choice

As you can see there are two types of errors possible here: sending an innocent person to jail and letting a guilty person go free. No court system can possibly be error free all the time. Given that the world is imperfect, how do we proceed intelligently?

A little less dramatically, suppose you have a coin in front of you, and you and I are betting $1 on each coin toss. You probably donโ€™t trust me entirely, so I allow you to examine the game coin. What would you do to ensure the game is fair?

Since you donโ€™t have a โ€œcoin-o-meterโ€ in your laboratory, I guess youโ€™re going to start tossing. Suppose you toss 1000 times and obtain 540 Heads. At this point can you reasonably conclude anything? Since we know about population parameters like the population proportion, and about probability models like the binomial distribution, we have everything we need to move forward.

Here is what we are really asking about the coin: โ€œIs the long term proportion of heads delivered by the coin 50%?โ€ If we call the population proportion of heads delivered by the coin ๐‘ we are asking โ€œIs ๐‘ = 0.5 ?โ€ All we can do at this point is construct a confidence interval and see whether it includes 0.5. Do you see that, while you must make a choice as to whether the coin is fair or not, you can never be sure? Itโ€™s possible for a fair coin to give 540 heads on 1000 tosses, just as it is possible for an unfair coin to give 540 heads on 1000 tosses. The interesting question is: Is it likely?

Hereโ€™s a 95% confidence interval:

๐‘ ยฑ ๐‘ง๐‘^ ๐‘^1 ๐‘›โˆ’ ๐‘

0.54 ยฑ 1.96 0.54^10001 โˆ’^ 0.

Your best guess is that the coin is not fair since the interval does not include 0.5.

Before we proceed much further we need to develop a few terms. These are all defined in your text book- take a moment to look them up before seeing what I have to say about them.

  1. Null and Alternative Hypothesis
  2. Types of Errors
  3. Left tailed, right tailed, and two tailed tests

was fair. This is called a Type I error. The other type of error is failing to detect that the coin was biased (heh, heh, hehโ€ฆ) when in fact the coin was not fair. This is called a Type II error. So, we have:

๐ป_ 0 is true ๐ป_ 0 is not true

Reject ๐ป_ (^0) Incorrect choice, Type I error Correct choice

Do Not Reject ๐ป_ (^0) Correct choice Incorrect choice, Type II error

In our court case, even if you vote โ€œguiltyโ€ you are never absolutely sure that the accused committed the crime. You donโ€™t need to be sure, just believe guilt โ€œbeyond a reasonable doubtโ€, not beyond all doubt. Itโ€™s the same in the coin toss. Even if you get 1000 Heads on 1000 tosses (an event with probability 9.3326โ€ฆe-302 (thatโ€™s 301 zeros before the 93326โ€ฆ) you are not absolutely sure the coin is biased. Tiny is not zero. Miniscule is not zero. Thus you have to draw the line somewhere. It is common in statistical tests to use the number 5% as a level of significance. This is the error rate we are willing to accept when rejecting ๐ป_0. We call this number ๐›ผ (alpha) and say that

the level of significance โ‰ก ๐›ผ โ‰ก ๐‘ก๐‘•๐‘’ ๐‘๐‘Ÿ๐‘œ๐‘๐‘Ž๐‘๐‘–๐‘™๐‘–๐‘ก๐‘ฆ ๐‘œ๐‘“ ๐‘Ÿ๐‘’๐‘—๐‘’๐‘๐‘ก๐‘–๐‘›๐‘” ๐‘Ž ๐‘ก๐‘Ÿ๐‘ข๐‘’ ๐‘›๐‘ข๐‘™๐‘™ ๐‘•๐‘ฆ๐‘๐‘œ๐‘ก๐‘•๐‘’๐‘ ๐‘–๐‘ 

If you would like your level of significance to be zero you havenโ€™t been paying attention. The only way to ensure that no innocent people go to jail is to create a system in which absolutely everyone is set free. That would be chaos.

On the other hand, we need to control our Type II error rate as well. Since we call the probability of committing a Type I error (reject true null) ๐›ผ, you wonโ€™t be surprised that we call the probability of committing a Type II error (accept false null) ๐›ฝ. Type II errors are a little harder to deal with and we pretty much ignore them in this course. I canโ€™t leave the topic before mentioning that people define the power of a statistical test as the probability we reject a null hypothesis when it is false.

Once we know what we want to test (๐‘ค๐‘’ ๐‘˜๐‘›๐‘œ๐‘ค ๐ป 0 ๐‘Ž๐‘›๐‘‘ ๐ป_1 ) and we know how often we are willing to make a mistake in the direction of rejecting a true null ( ๐‘ค๐‘’ ๐‘˜๐‘›๐‘œ๐‘ค ๐›ผ ) we must consider some sort of

evidence. To do this in a statistical test we compute some relevant statistic. In our coin toss example we should calculate the probability that we would get as many as 540 Heads on 1000 tosses with a fair coin. If this probability is too small (typically if it is less that 5%) we will conclude that the coin is not fair. If you think about it for a moment, we would consider a coin that delivered too few Heads to be biased as well. Thus we will calculate the probability that a fair coin would give a result as extreme as 540 Heads. Use the normal approximation to the binomial random variable and compute as follows:

๏‚ท Convert the target value to standard units

๐‘ง = ๐‘ โˆ’^ ๐‘

= 0.54^ โˆ’^ 0.

๏‚ท Determine how likely it is to see a data point (statistic) this far away from the hypothesized population parameter.

๐‘ƒ โˆ’2.5298 < ๐‘ง < 2.5298 โ‰ˆ ๐‘ƒ โˆ’2.53 < ๐‘ง < 2.53 = 0.9943 โˆ’ 0.0057 = 0.

So, the probability of observing a test statistic as extreme as ours is 1 โˆ’ 0.9886 = 0.0114. Not impossible, but pretty unlikely. In fact we agreed to reject our null hypothesis if we saw data that was less likely that 5%. Since 0.0114 < 0.05 we reject the null hypothesis and conclude that the coin is not fair.

Notice that I was sneaky and made you compute a ๐‘-value without even telling you about it. We define a ๐‘-value as the probability of observing data as extreme as the data you collected. Note that we reject our null hypothesis when we observe unlikely data, thus

We reject the null hypothesis when our ๐’‘ -value is smaller than ๐œถ.

Assume for the sake of the test that ๐ป 0 : ๐œ‡๐น๐‘–๐‘๐‘Ÿ๐‘œ = 50_. Since we feel that these folks will tend to have a lower Life Satisfaction and we wish to refute the null hypothesis, we will conduct a 1 tailed test and take as our alternative_ ๐ป 1 : ๐œ‡๐น๐‘–๐‘๐‘Ÿ๐‘œ < 50_. We will then reject the null hypothesis if our data are โ€œtoo lowโ€. Make sure you read about one versus two tailed tests in your textbook!_

  1. Decide upon a level of significance, ๐›ผ.

When we do this we really arenโ€™t doing math anymore, just trying to proceed as best we reasonably can. It is very common to take ๐›ผ = 0.05_. It is also common to use_ ๐›ผ = 0.. To see this in action we will use this smaller value here.

  1. Compute a test statistic (๐‘ง, ๐‘ก, ๐œ’^2 , ๐‘Ž๐‘›๐‘‘ ๐น are popular stats).

Since we are dealing with means and with large sample sizes we will compute a ๐‘ง statistic. Remember that

๐‘ง = ๐‘ฅ โˆ’ ๐œ‡๐œŽ ๐‘›

= 1048 โˆ’^50

  1. Find the ๐‘-value corresponding to your test statistic (for a left tailed, a right tailed, or a two

tailed test). We find the probability that ๐‘ง < -2.28. From our z table I get p=0.0113. Since this number is not less than ๐›ผ = 0.01 we may not reject the null hypothesis at the ๐›ผ = 0.01 level of significance. (Note that if we had been less picky we could have rejected at the ๐›ผ = 0.05 level of significance. It all depends upon where you initially set up the goal posts.

  1. Form a conclusion: if ๐‘ < ๐›ผ (improbable data) reject ๐ป 0 , otherwise do not reject. We never

accept, just like the courts never say that someone is innocent.

We were not able to reject the null hypothesis at the stated level of significance. These data are insufficient to conclude that individuals with Fibromyalgia have a lower Life Satisfaction than the general population.

Here is your first presentation problem of the week. Suppose we had instead used ๐’ = ๐Ÿ๐Ÿ“ for a sample size and did not know the standard deviation but instead had to estimate it with ๐’” = ๐Ÿ”. If we had obtained ๐’™ = ๐Ÿ’๐Ÿ’ would we be able to conclude ๐‘ฏ๐Ÿ: ๐๐‘ญ๐’Š๐’ƒ๐’“๐’ < 50 at the ๐œถ = ๐ŸŽ. ๐ŸŽ๐Ÿ“ level of significance? This is called a t-test and is the subject of the next lecture.