Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Fundamentals of Statistical Inference, Exams of Statistics

The key concepts and techniques in statistical inference, including the mean and standard deviation, the 68-95-99.7 rule, z-scores, hypothesis testing, effect size, statistical power, confidence intervals, and linear regression. It provides a comprehensive overview of the statistical methods used to draw conclusions from sample data and make inferences about the underlying population. The importance of understanding these statistical principles and their practical applications in various fields, such as psychology, economics, and social sciences. It also discusses the assumptions and limitations of these techniques, as well as strategies for interpreting and communicating statistical findings effectively.

Typology: Exams

2024/2025

Available from 10/14/2024

star_score_grades
star_score_grades 🇺🇸

3.6

(19)

1.7K documents

1 / 33

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
St Petersburg College Florida
Statistics (Stat 2023) Exam
Quiz Module 4
Course Title and Number:Statistics (Stat 2023) Quiz module
4
Exam Title:[Insert Exam Title]
Exam Date:[Insert Exam Date]
Instructor:[Insert Instructor’s Name]
Student Name:[Insert Student’s Name]
Student ID:[Insert Student ID]
Examination
180 minutes
Instructions:
1. Read each question carefully.
2. Answer all questions.
3. Use the provided answer sheet to mark your
responses.
4. Ensure all answers are final before submitting the
exam.
Good Luck!
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18
pf19
pf1a
pf1b
pf1c
pf1d
pf1e
pf1f
pf20
pf21

Partial preview of the text

Download Fundamentals of Statistical Inference and more Exams Statistics in PDF only on Docsity!

St Petersburg College Florida

Statistics (Stat 2023) Exam

Quiz Module 4

Course Title and Number: Statistics (Stat 2023) Quiz module 4 Exam Title: [Insert Exam Title] Exam Date: [Insert Exam Date] Instructor: [Insert Instructor’s Name] Student Name: [Insert Student’s Name] Student ID: [Insert Student ID]

Examination

180 minutes

Instructions:

**1. Read each question carefully.

  1. Answer all questions.
  2. Use the provided answer sheet to mark your** **responses.
  3. Ensure all answers are final before submitting the** exam. Good Luck!

Statistics (Stat 2023) Exam

Quiz Module 4

What term refers to a frequency distribution that follows a bell- shaped, symmetrical, and unimodal curve? - Answer>> normal distribution In a normal distribution, the mean is located where? - Answer>> in the middle of the curve T/F For a normal curve, the median, mean, and mode are typically equal T/F The area below the curve is 120%. - Answer>> True False. 100% or 1. T/F The greater the standard deviation, the less spread out the normal curve. - Answer>> FALSE The greater the standard deviation, the more spread out the normal curve. The smaller the standard deviation, the narrower the normal curve. A normal distribution can be defined by its ___ and ______. - Answer>> mean and standard deviation the --___ rule con? - Answer>> 68-95-99. However, the rule only works when values are exactly 1, 2, or 3 standard deviations away from the mean. In order to apply the concept of proportions to other standard deviation values, such

Josef: X=87 - 1.75(4) = 80 Marco: X= 87 - 2.50(4) = 77 Brooklyn: X= 87 + 1.25(4) = 92 z-Scores for Sample Means from a Population - Answer>> (sample mean-population mean)/ population mean/ sqrt(sample size) Consider the example of the exam scores from Figure 4.3 ( μ = 75 , σ = 5 ). Suppose the teacher would like to know the proportion of scores on the exam that are below 65. - Answer>> 1. transform raw score to a z-score 2 use normal table to find the p-value for z score The proportion of scores below the z= -2.00 is 0.02275 or 2.275%. Therefore, a test score below 65 is unusual since less than 2.275% of the scores fall below it. a range of values that is likely to contain the true population mean. - Answer>> confidence interval What states that the distribution containing all sample means will approach the population mean. This implies that the population mean will be close to the sample mean. - Answer>> The Central Limit Theorem 95% confidence interval - Answer>> he 95% confidence interval, for example, means that 95% of the experiments with the given treatment will contain the true population mean. Consequently, 5% (or 1 in 20) of the experiments will not contain the true population mean. A 95% confidence interval implies that the researcher is 95% confident that the population mean lies in the interval that is centered around the sample mean. margin of error - Answer>> z crit *(population standard deviation/ sqrt (sample size)

T/F Hypothesis testing can determine the absolute size of the effect of the treatment. It can determine whether the treatment caused a substantial effect - Answer>> False. , it cannot determine the absolute size of the effect of the treatment. In other words, it cannot determine whether the treatment caused a substantial effect What researchers calculate to determine the absolute magnitude, or size, of the treatment. One measure for this is... - Answer>> effect size One measure for effect size is Cohen's d. cohen's d= mean difference/ standard deviation= (sample mean- population mean)/ s dev d=0. d=0. d=0.8 - Answer>> small effect medium effect large effect the probability of correctly rejecting a null hypothesis that is false - Answer>> statistical power Power = 1 - Beta - Answer>> In other words, if there is an effect, the power describes the likelihood that the study will provide evidence of the effect. Power is calculated prior to beginning a research study to define the probability of committing a Type II error (failing to reject a false null hypothesis) a value known as β. The statistical power of a study is measured on a scale of 0 to

  1. Thus, researchers can calculate statistical power by using the following formula:

A ______ test involves a directional hypothesis, whereas a ______ test involves a non-directional hypothesis. - Answer>> one-tailed, two-tailed What is the critical area? - Answer>> The area under the curve containing extreme values that rarely occur in the distribution. If the test statistic falls in the critical region, the result is statistically significant and you reject the null hypothesis. What is effect size? - Answer>> While hypothesis testing can determine statistical significance, effect size can tell us how meaningful or impactful a particular result is. Which of the following factors affects statistical power? Type of hypothesis test Significance level Effect size All of the above - Answer>> All of the above You are conducting an experiment that is expected to increase the mean scores for participants in the population. If the population mean is μ=75, which statement below is the correct alternative hypothesis (Ha ) for a one-tailed test?a. μ>75b. μ≥75c. μ<75d. μ≤75 - Answer>> A A sample of n =12 is selected from a population whose mean is μ=80 (σ=12). After a treatment is applied to the sample, the size of the treatment is d=0.25. What was the sample mean?a. =79b. =81c. =83d. =85 - Answer>> d=0. d= x-M/ sdev x= d(sdev) + M x= 83 C

In a study, how does the sample size affect the rejection of the null hypothesis and Cohen's d? Assume other factors are constant. A smaller sample size increases the likelihood of rejecting the null hypothesis and decreases Cohen's d. A larger sample size increases the likelihood of rejecting the null hypothesis and increases Cohen's d. A small sample size decreases the likelihood of rejecting the null hypothesis and does not affect Cohen's d. A larger sample size increases the likelihood of rejecting the null hypothesis and does not affect Cohen's - Answer>> d. A larger sample size increases the likelihood of rejecting the null hypothesis and does not affect Cohen's Why is it important to measure effect size? - Answer>> While hypothesis testing can determine statistical significance, effect size can tell us how meaningful or impactful a particular result is. What is a confidence interval? - Answer>> A confidence interval is a range of values that is likely to contain the true population mean. The 95% confidence level is the most common. What two things are used to calculate a confidence interval? - Answer>> Point estimate and margin of error a sample statistic that is used to estimate the population parameter e.g. x bar - Answer>> point estimate a range of values likely to contain the population parameter - Answer>> confidence interval 95% confidence represents - Answer>> range of values where you would fail to reject the null hypothesis

Since this is a two-tailed test at α = 0.05, zcrit = ±1.96.

  1. Calculate the Test Statistic
  2. Compare and Decide Since z = -1.57 does not exceed the critical value of ±1.96, we fail to reject the null hypothesis. The result is not statistically significant.The results should be reported as follows:z = -1.57, p > 0.05, two-tailed 2 Sample t test used for... - Answer>> - Comparison of 2 group means ANOVA used for... - Answer>> Comparison of 3 or more group means Which chart/graph is used to show correlation? - Answer>>
  • Scatter Plot Correlation Overview - Answer>> - Measures association between 2 numeric variables
  • Correlation coefficient and significance
  • Correlation and causality Correlation - Answer>> The simplest measure of a relationship between 2 variables is given by the correlation. a) If one variable increases, does the other increase too? b) Or does one decrease when the other increases? c) Or is there no relationship? Correlation Coefficient - Answer>> - "r " is the Pearson correlation coefficient
  • x, y are 2 variables with means denoted by (line over x) and (line over y)

What is the Pearson Correlation Coefficient? - Answer>> - The average product of two standardized variables How can normal variables x and y be transformed to standard normals? - Answer>> - Use the z-score transformation Properties of the Pearson Correlation Coefficient - Answer>> - r can take values between -1 and 1 ii: r = 0: no correlation ii: 0 < r < 1: positive correlation

  • Similar relationship
  • If one goes up, so does the other ii: -1 < r < 0: negative correlation
  • Inverse relationship
  • If one goes up, the other goes down The definition of r is symmetric in x and y:
  • r(x,y) = r(y,x) No dependent and independent variables 4 Graphical ways the line can go - Answer>> - (+): the line will be straight and from left to right go up
  • (-): the line will be straight and from left to right go down
  • No relationship: the line will just go straight across
  • Nonlinear: the line will be curved ii: Can't use a Pearson correlation coefficient if the line is curved (nonlinear) Is r (Pearson coefficient) small or large - Answer>> - Use Cohen's guidelines: Small: r in [0.1, 0.3] (+) correlation OR r in [- 0.1, - 0.3] (-) correlation Medium: r in [0.3, 0.5] (+) correlation
  • Causality requires more evidence than merely a significant correlation.
  • For instance, two variables may be correlated due to impact of a third variable that was not measured.
  • Other examples that could be misinterpreted: a) Crime and number of police
  • They both be increasing, but it is it a high crime rate causing more police or more police causing a higher crime rate? There is no way to determine which one is causing the other (causality). b) Health of community and number of nurses:
  • They are both increasing but it is it a high number of health problems causing an increase in the number of nurses or is it a high number of nurses causing an increase in health problems? There is no way to determine which one is causing the other (causality). Basic Mathematics of a line - Answer>> - Slope
  • Intercept Simple Linear Regression - Answer>> - *Predict one numeric variable from knowledge of *another
  • Example 1: Haque and Zaritsky modeled expected systolic blood pressure for a child in terms of its age ... So if you know the child's age, you know what SBP to expect
  • Example 2: Mooney, Holmes, and Christie modeled the total number of flu cases expected in any year in terms of the highest weekly growth rate of flu cases Regression - Answer>> - The word regression is due to Francis Galton in his 1885 paper on the inheritance of stature.
  • Regression refers to the phenomenon whereby descendants of parents of extreme stature tend toward the average height of the population.
  • While the term regression says something about the behavior of residuals, the main idea is to fit a line to the data at hand. What is a line? - Answer>> - In a scatter plot, the pattern must resemble a straight line, or a cigar shape in practice.
  • Algebraically, the equation of a straight line is y = b (subscript 0) + b(subscript 1) x b0 is the intercept b1 is the slope
  • In terms of the standardized variables xz and yz it is β replaced b1 as the slope (Subscript z denotes standardization) y = dependent variable x = independent variable Slope - Answer>> - The slope provides the rate of change.
  • It represents the amount by which y changes for a single unit change in x
  • Slope can take any value, positive or negative or zero Intercept - Answer>> - The intercept is the value of y when x is zero, i.e. the value of y at which the line intersects the y-axis
  • Intercept can take any value, positive or negative or zero b (subscript 1) or Beta - Answer>> - "rise over run" Other names for slope, intercept - Answer>> - The slope and intercept are also called model coefficients

All 3 of these are squared and summed at the end Least Squares Solution continued

  • Minimizing the Residual Sum of Squares - Answer>> - The main idea is to find b0 and b1 that minimize S.
  • The algebra is easiest to write and understand in terms of standardized variables
  • In this scheme, we transfer over to the standardized variables, get the solution, then transfer back to the solution in terms of the original variables. The Solution that minimizes "S" is... - Answer>> - The standardized slope equals the correlation coefficient
  • This is the Pearson "r" correlation coefficient Results of study by Haque and Zaritsky - Answer>> - Systolic blood pressure (5th percentile at 50th height percentile) was modeled on child's age: SBP(subscript 5) = 65 + (2 x age) Ex: 10 y/o child SBP(subscript 5) = 65 + (2 x 10) => 65 + 20 => 85
  • This denotes that only 5% of kids would have a SBP < 85 and that 95% would have a SBP > 85.
  • SBP5 denotes systolic blood pressure at the 5th percentile at 50th height percentile
  • Age is measured in years What happens to the line when a log is used? - Answer>> - It curves Evaluating the Regression Model - Answer>> - Hypothesis Tests: a) F test (ANOVA) of model

b) *t tests of slope and intercept

  • Model goodness-of-fit
  • Were assumptions met? Residuals are normal, homoscedastic *Variables are linearly related
  • Observations are independent Hypothesis tests - Answer>> - The primary hypothesis of the simple linear regression model is about the existence of a linear relationship Null hypothesis, H0: β = 0 (aka Pearson r correlation coefficient) Alternative hypothesis, H1: β ≠ 0 F test is used -The* secondary hypotheses* are about the slope and intercept Null hypothesis, slope = 0 Null hypothesis, intercept = 0 t tests are used (2/3 of the tests will have the same p-values) ANOVA for regression model - Answer>> - The significance of the regression model is assessed by analysis of variance after decomposing the total sum of squares into regression and residual components Total SS = Residual SS + Regression SS

So for F1, 18 that means the sample size is 20 because it will always be F1 and the 18 is from 20- So for F1, 58 that means the sample size is 60 because it will always be F1 and the 58 is from 60- t tests of coefficients - Answer>> - The null hypotheses are that: (1) slope = 0, and (2) intercept = 0 Reject null if t > tn-2, crit or if t < -tn-2, crit, where tn-2, crit is the critical value, generally the 95% probability point, of the t distribution with n- degrees of freedom 95% confidence intervals - Answer>> - The 95% confidence intervals of the regression coefficients are obtained from the standard error of each coefficient and the t distribution: Slope: b1 ± tn-2, crit SE(b1) Intercept: b0 ± tn-2, crit SE(b0) (SE: Standard Error) R2 goodness-of-fit - Answer>> - R2 is a summary measure of goodness-of-fit that is widely used

  • Definition: R2 = Reg SS / Total SS
  • R2 is a number between 0 and 1, with high values indicating a good model
  • It is the proportion of variance of the DV that is explained by the linear model in terms of the IV. (DV: Dependent Variable, IV: Independent Variable) Note: The IV is also called the predictor in regression models

Remember: a) The denominator will ALWAYS be bigger than the numerator so that means... b) R2 can NEVER be greater than 1 and it will ALWAYS be between 0 - 1. c) The closer to 1, the better the model How to judge a model using R2 - Answer>> - In simple regression (1 DV + 1 IV), R is rigidly tied to the slope: R = β = r.

  • Adapting the Cohen guidelines for r to R2 suggests that it is: Small, if R2 is in [0.01, 0.09] (1% - 9%) Medium, if R2 is in [0.09, 0.25] (9% - 25%) Large, if R2 > 0.25 (> 25%) Assumptions of Regression - Answer>> 1) Ratio of cases to IVs must be substantial A rule of thumb is: N > 50 +8k for multiple regression with k IVs, i.e. N > 58 for simple regression.
  1. Outliers must be removed from the IVs and the DV
  2. Independence: data values (both DV and IV) are independent of each other Data must not be collected repeatedly from the same subject Interviewer learning curve for a questionnaire . Other assumptions of regression must be tested after computing the model.
  3. Normality: Residuals (errors) are normally distributed