Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Data Analysis: Comparing Sugar Content in Cereals, Student Sleep Habits, and More - Prof. , Study Guides, Projects, Research of Statistics

Various data analysis projects, including comparing sugar content in cereals by shelf location using boxplots, estimating student sleep patterns through surveys, and testing brand preferences in taste tests. Other projects involve analyzing words per sentence in magazines, conducting a taste test for cola brands, and investigating gender differences in height and expectations for children.

Typology: Study Guides, Projects, Research

Pre 2010

Uploaded on 08/13/2009

koofers-user-3an
koofers-user-3an 🇺🇸

10 documents

1 / 5

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Math 308 Spring 2009 Project Ideas
Project Group: 4 people. You may choose. I will create 4-student
groups by random selection if your group is smaller than 4 students.
Ideas: Note that emphasis is on the analysis and the presentation.
Do not make your experiment too demanding!
1. Go to a local grocery store and collect these data for at least 75
breakfast cereals: cereal name; grams of sugar per serving; and
the shelf location (bottom, middle, or top). Group the data by
shelf location and use three boxplots to compare the sugar
content by shelf location. [Observational data; using boxplots to
summarize data, can also be used for ANOVA test; high-sugar
cereals are often at child-eye height.]
2. Use computer software to simulate 1,000 flips of a fair coin.
Record the fraction of the flips that were heads after 10, 100,
and 1,000 flips. Repeat this experiment 100 times and then use
three histograms to summarize your results. [Simulation data;
using histograms to summarize data; demonstrates central limit
theorem and effect of sample size on standard deviation.]
3. Estimate the average number of hours that students at this
school sleep each day, including both nighttime sleep and
daytime naps. Also estimate the percentage who have been up
all night without sleeping at least once during the current
semester. [Survey data; confidence intervals for quantitative and
qualitative data; students sleep less than 8 hours and many have
all-nighters; if done at the beginning and end of the term, the
differences are as expected.]
4. Estimate and compare the average words per sentence in
People, Time, and New Republic. [Observational data; confidence
interval with quantitative data; the order given is from fewest
words to most; New Republic has some outlier sentences with
close to 100 words.]
5. Estimate the percentage of the seniors at this college who
regularly read a daily newspaper, the percentage who can name
the two U.S. senators from their home state, the percentage who
are registered to vote, and the percentage who would almost
certainly vote if a presidential election were held today. [Survey
data; confidence intervals for qualitative data; far more students
are registered and will vote than read a newspaper or can name
their senators.]
6. Conduct a taste test of either Coke versus Pepsi or Diet Coke
versus Diet Pepsi. Survey at least 50 randomly selected students
pf3
pf4
pf5

Partial preview of the text

Download Data Analysis: Comparing Sugar Content in Cereals, Student Sleep Habits, and More - Prof. and more Study Guides, Projects, Research Statistics in PDF only on Docsity!

Math 308 Spring 2009 Project Ideas Project Group : 4 people. You may choose. I will create 4-student groups by random selection if your group is smaller than 4 students. Ideas : Note that emphasis is on the analysis and the presentation. Do not make your experiment too demanding!

  1. Go to a local grocery store and collect these data for at least 75 breakfast cereals: cereal name; grams of sugar per serving; and the shelf location (bottom, middle, or top). Group the data by shelf location and use three boxplots to compare the sugar content by shelf location. [Observational data; using boxplots to summarize data, can also be used for ANOVA test; high-sugar cereals are often at child-eye height.]
  2. Use computer software to simulate 1,000 flips of a fair coin. Record the fraction of the flips that were heads after 10, 100, and 1,000 flips. Repeat this experiment 100 times and then use three histograms to summarize your results. [Simulation data; using histograms to summarize data; demonstrates central limit theorem and effect of sample size on standard deviation.]
  3. Estimate the average number of hours that students at this school sleep each day, including both nighttime sleep and daytime naps. Also estimate the percentage who have been up all night without sleeping at least once during the current semester. [Survey data; confidence intervals for quantitative and qualitative data; students sleep less than 8 hours and many have all-nighters; if done at the beginning and end of the term, the differences are as expected.]
  4. Estimate and compare the average words per sentence in People , Time , and New Republic. [Observational data; confidence interval with quantitative data; the order given is from fewest words to most; New Republic has some outlier sentences with close to 100 words.]
  5. Estimate the percentage of the seniors at this college who regularly read a daily newspaper, the percentage who can name the two U.S. senators from their home state, the percentage who are registered to vote, and the percentage who would almost certainly vote if a presidential election were held today. [Survey data; confidence intervals for qualitative data; far more students are registered and will vote than read a newspaper or can name their senators.]
  6. Conduct a taste test of either Coke versus Pepsi or Diet Coke versus Diet Pepsi. Survey at least 50 randomly selected students

who identify themselves beforehand as cola drinkers with a definite preference for one of the brands you are testing. Give each subject a cup of each cola that has been coded in a way known only to you. Calculate the fraction of your sample whose choice in the taste test matches the brand identified beforehand as their favorite. (Do not tell your subjects that this is a test of their ability to identify their favorite brand; tell them it is a test of which tastes better.) Determine the two-sided p-value for a test of the null hypothesis that there is a 0.5 probability that a cola drinker will choose his or her favorite brand. [Experimental data; hypothesis test using binomial model; most students prefer Coke, but neither group is very successful at identifying its favorite.]

  1. Find five avid basketball players and ask each of them to shoot 100 free throws. Do not tell them the purpose of this exercise, which is to determine if a missed free throw is equally likely to bounce to the same or opposite side as their shooting hand. Use your data for each of these players to calculate the two-sided p- value for testing the null hypothesis that a missed free throw by this player is equally likely to bounce to either side. [Experimental data; hypothesis test using binomial model; coaches often say that the ball will bounce to shooting-hand side, but the data are unpersuasive.]
  2. Ask 50 female students these four questions: Among female students at this college, is your height above average or below average? Is your weight above average or below average? Is your intelligence above average or below average? Is your physical attractiveness above average or below average? Ask 50 male students these same questions (in comparison to male students at this college). Try to design a survey procedure that will ensure candid answers. For each gender and each question, test the null hypothesis that p = 0.5. [Survey data; hypothesis test using binomial model; most males think that they are above average.]
  3. Young children who play sports are often separated by age. In 1991, for example, children born in 1984 might have been placed in a 7-year-old league while children born in 1983 were placed in an 8-year-old league. Someone born in January 1984 is eleven months older than someone born in December 1984. Because coaches give more attention and playing time to better players, children with early birth dates may have an advantage when they are young that cumulates over the years. To test this theory, look at a professional sport and see how many players have birth dates during the first six months of the year. [Observational data; hypothesis test using binomial model; seems to be true.]

relationship. [Survey data; chi-square test; the start of each term is a popular time for romance.]

  1. Ask 50 randomly selected students this question and then compare the male and female responses: "You have a coach ticket for a nonstop flight from Los Angeles to New York. Because the flight is overbooked, randomly selected passengers will be allowed to sit in open first-class seats. You are the first person selected. Would you rather sit next to: (a) the U.S. president; (b) the president's wife; or (c) Michael Jordan? [Survey data; chi- square test; females choose the president's wife, males the president.]
  2. For each of the 50 states, calculate Bill Clinton's percentage of the total votes cast for the Democratic and Republican presidential candidates in 1992; do not include votes for other candidates. Do the same for the 1996 election. Is there a statistical relationship between these two sets of data? Are there any apparent outliers or anomalies? [Observational data; simple regression; extremely strong correlation with a few anomalies.]
  3. Select an automobile model and year (at least three years old) that is of interest to you -- for example, a 1993 Saab 900S convertible. Now find at least 30 of these cars that for sale (either from dealers or private owners) and record the odometer mileage (x) and asking price (y). As best you can, try to keep the cars as similar as possible. For example, ignore the car color, but do not mix together 4-cylinder and 6-cylinder cars or manual and automatic transmissions. Estimate the equation y = a + bx + e and summarize your results. [Observational data; simple linear regression; good fit with reasonable coefficients and interesting outliers.]
  4. Pick a date and approximate time of day (for example, 10:00 in the morning on April 1) for scheduling nonstop flights from an airport near you to at least a dozen large U.S. cities. Determine the cost of a coach seat on each of these flights and the distance covered by each flight. Use your data to estimate a simple linear regression model with ticket cost the dependent variable and distance the explanatory variable. Are there any outliers? [Observational data; simple linear regression; good fit with reasonable coefficients and interesting outliers.]
  5. Go to a large bookstore that has a prominent display of best-selling fiction and nonfiction hardcover books. For each of these two categories, record the price and number of pages for at least ten books. Use these data to estimate a multiple regression model with price the dependent variable and three explanatory variables: a dummy variable that equals 0 if nonfiction and 1 if fiction, the number of pages, and the dummy

variable multiplied by the number of pages. Are there any apparent outliers in your data? [Observational data; multiple regression; good fit with reasonable coefficients and interesting outliers.]

  1. Ask 100 randomly selected students to estimate their height and the heights of both of their biological parents. Also note the gender of each student in your sample. Now estimate a multiple regression model with the student's height as the dependent variable and the student's gender, mother's height, and father's height as the explanatory variables. [Survey data; multiple regression; good fit with reasonable coefficients and evidence of regression toward the mean.]
  2. Purchase a king size (3.27oz) and a regular size (1.74oz) bag of peanut M&Ms were purchased from at least 3 different stores. Open each bag and count the number of each color peanut M&M. The proportion of each color for each bag size was calculated. Obtain the expected color poportions from the M&Ms website. Perform a chi-squared goodness of fit test to determine whether deviation from the proportions stated on the M&Ms website was greater than chance.
  3. Find the team salaries for every team in three major sports leagues: the NBA, the MLB, and the NFL. Compile the collected data and compare the aggregate team salary to the number of wins the team had in the 2005-2006 season (or an earlier season). Is there a linear relationship? Other similar questions:
  4. Are the 6 flavors in Trix cereal uniformly distributed?
  5. Is there a linear relationship between average income and percentage of citizens voting for George Bush in 2000 in the counties of the United States?
  6. The higher the HS GPA, the more schools a student applies to. Valid?