Prepare for your exams
Get points
Guidelines and tips

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Log in Sign up

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search Store documents

The best documents sold by students who completed their studies

Search through all study resources

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

University Rankings

Discover the best universities in your country according to Docsity users

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

From our blog

Exams and Study

Go to the blog

Chi-Square Goodness-of-Fit Test: Analyzing Categorical Data with Chi-Square Statistic - Pr, Study notes of Statistics

Fayetteville State University (FSU)Statistics

Prof. David S. Wallace

An outline of lesson 19 on chi-square goodness-of-fit test, which is used to analyze categorical data and test whether observed frequencies differ from expected frequencies. The concept of categorical data, the hypothesis testing process, and the calculation of the chi-square statistic. It also includes an example of applying the test to insurance claims data.

Typology: Study notes

Pre 2010

Uploaded on 08/01/2009

koofers-user-r4w 🇺🇸

8 documents

1 / 3

This page cannot be seen from the preview

Don't miss anything!

Lesson 19

Chi-Square

Outline

Categorical Data

Goodness of Fit Test

-observed frequency

-expected frequency

-X2 statistic

Example

-hypothesis testing

Categorical Data

As mentioned at the start of the lesson with correlation, all of the data we have been

working with so far involve measurement data. We actually took measurements from

units in our sample to create our distribution. Often times, however, we will want to

analyze categorical or qualitative data as well. For categorical data we will not have a

measure of individual units in the sample. Instead, we will analyze frequencies or counts

of people falling into different categories or groups. When analyzing categorical data we

say the test is non-parametric. Thus, all the tests we have learned before this point were

parametric tests.

Chi-Square Goodness-of-Fit Test

We will learn two different Chi-square tests. The first of these is the goodness-of-fit test.

With the goodness-of-fit test we will test whether the data “fit good” with what we would

expect if only chance factors were operating. For example, if I measured the number of

insurance claims for different car types, I might have the following data:

High Performance Compact Mid Size Full Size

20 14 7 9

Notice that our data is now frequency values or how many values in our sample fit into

different categories. The test will tell us whether there is a difference in how many

values fall at different levels of the single variable (car type). Is there a difference in

number of claims for different car types?

The values we observe in our sample are the observed frequencies ( 0

f). What we want

to know is if they differ from the frequencies we would observe by chance. The values

we would expect if there really was no difference in the number of claims made for

different car types are what we call the expected frequencies ( e

f). If there really was no

difference in the frequencies for each level of the variable, then we would expect equal

numbers of claims for each car type. Since there a total of 50 claims in our sample, and

Partial preview of the text

Download Chi-Square Goodness-of-Fit Test: Analyzing Categorical Data with Chi-Square Statistic - Pr and more Study notes Statistics in PDF only on Docsity!

Lesson 19 Chi-Square

Outline Categorical Data Goodness of Fit Test -observed frequency -expected frequency -X^2 statistic Example -hypothesis testing

Categorical Data As mentioned at the start of the lesson with correlation, all of the data we have been working with so far involve measurement data. We actually took measurements from units in our sample to create our distribution. Often times, however, we will want to analyze categorical or qualitative data as well. For categorical data we will not have a measure of individual units in the sample. Instead, we will analyze frequencies or counts of people falling into different categories or groups. When analyzing categorical data we say the test is non-parametric. Thus, all the tests we have learned before this point were parametric tests.

Chi-Square Goodness-of-Fit Test We will learn two different Chi-square tests. The first of these is the goodness-of-fit test. With the goodness-of-fit test we will test whether the data “fit good” with what we would expect if only chance factors were operating. For example, if I measured the number of insurance claims for different car types, I might have the following data:

High Performance Compact Mid Size Full Size 20 14 7 9

Notice that our data is now frequency values or how many values in our sample fit into different categories. The test will tell us whether there is a difference in how many values fall at different levels of the single variable (car type). Is there a difference in number of claims for different car types?

The values we observe in our sample are the observed frequencies ( f (^) 0 ). What we want

to know is if they differ from the frequencies we would observe by chance. The values we would expect if there really was no difference in the number of claims made for

different car types are what we call the expected frequencies ( f e ). If there really was no

difference in the frequencies for each level of the variable, then we would expect equal numbers of claims for each car type. Since there a total of 50 claims in our sample, and

there are 4 different levels of the variable, then we would expect 12.5 claims for each car type. Thus:

High Performance Compact Mid Size Full Size 20 14 7 9 Observed 12.5 12.5 12.5 12.5 Expected

What the Chi-square statistic does is to compare the values we observe to those we would expect if there was no difference. If what we observe varies a good bit from the values we would expect if there was not difference, then there must be a difference. If there really was no difference in the number of insurance claims, for this example, then we would expect the number of claims to be close to the expected frequencies.

e

o e f

2 f^ f^2

χ Å Notice that we subtract each expected value from each observed

value, square the difference, and divide by the expected frequency. We then sum up all of the values we computed.

Let’s take a look at the example we have been working on within the context of hypothesis testing. We will continue the problem with Alpha set to .05.

Step 1: Write the Hypotheses for the test.

H 1 f o ≠ fe

H 0 : f o = fe

Here we are stating that the observed frequencies are the same as the expected for the null.

Step 2: Find the Critical Value Again we will use Appendix A to find the critical value, see page A-34. For our test degrees of freedom is equal to C – 1, where C is the number of categories

Df = 4 – 1 = 3

2 Χ (^) critical = 7.

Step 3: Run the Statistical Test

We have already computed the expected values, so we just need to plug the numbers into the formula. High Performance Compact Mid Size Full Size 20 14 7 9 Observed 12.5 12.5 12.5 12.5 Expected