AP Statistics: Regression Analysis

Add to Favourites
Post to:
Comments
Presentation Transcript Presentation Transcript

AP Statistics Study Session : AP Statistics Study Session Regression David Friedman David.Kit.Friedman@gmail.com

Topics Covered in Regression Analysis : Topics Covered in Regression Analysis Basic idea of regression Pearson’s correlation coefficient: r Correlation does not imply causation. Coefficient of determination: r2 How to find the LSRL (least squares regression line) Formulas for regression

Topics Covered in Regression Analysis : Topics Covered in Regression Analysis Residual plots Outliers and influential points Transformations to achieve linearity Questions

Basic Idea of Regression : Basic Idea of Regression Let’s say we examine 10 people in a weight loss program and measure their height and weight

Wolfram|Alpha can give height and weight statistics for queries: “adult weight statistics” “adult height statistics”Weight is skewed right while height is more symmetric : Wolfram|Alpha can give height and weight statistics for queries: “adult weight statistics” “adult height statistics”Weight is skewed right while height is more symmetric

Thought Process : Thought Process We conjecture that weight is generally proportional to height The taller a person is the more they weigh and the shorter they are the less they weigh We can make a scatter plot of heights and weights.

Slide 7 :

Example Scatter Plots and Correlations Coefficients : Example Scatter Plots and Correlations Coefficients

Slide 9 : +r (strong correlation) +r (moderate correlation) -r (strong correlation) -r (moderate correlation) r˜0 (random/uncorrelated)

Regression Line : Regression Line There exists a procedure to calculate the least squares regression line (LSRL) The LSRL minimizes the sum of the squares of the vertical distances between the line and each point It always goes through the point

Using the TI-84 to find the regression line : Using the TI-84 to find the regression line Put the height data into L1 Put the weight data into L2 The most comprehensive set of data is returned by: LinRegTTest in the STAT->TESTS menu Here we can see the regression equation Weight = (-289.48 lbs) + (86.12 lbs/ft)*(height)

Using the regression equation to make a prediction : Using the regression equation to make a prediction Perhaps we would like to predict how much somebody who is 6’2’’=6.17 ft. weighs We can use the regression equation to find a prediction for the weight of a 6.17 ft. person. Weight = (-289.48 lbs.) + (86.12 lbs./ft)*(6.17 ft.) Weight = 241.89 lbs.

Regression Equation for the Weight vs. Height Example : Regression Equation for the Weight vs. Height Example Weight = (-289.48 lbs) + (86.12 lbs/ft)*(height)

Tangent: Deriving the Regression Equations : Tangent: Deriving the Regression Equations Deriving the equations from the least squared principle is a multivariable calculus problem We won’t cover that but it is on page 499 of Jay Devore’s book Probability and Statistics for Scientists and Engineers 5th edition (called PSSE in these review sessions). Devore’s equations on page 499 use different notation and are in a different form than the AP Statistics form. Algebraically, they are the same.

Surveymonkey Test : Surveymonkey Test Teacher asks the class: What is your favorite search engine? Students can respond at: http://www.surveymonkey.com/s/HCH2DJ7

Questions about slope : Questions about slope Interpret the slope of the regression line You would say: “Weight is predicted to raise 86.12 lbs. for every increase in height in feet.” Quick question: Why might people challenge this wording? “In people weight raises 86.12 lbs. for every increase in height in feet.” Weight = (-289.48 lbs) + (86.12 lbs/ft)*(height)

Extrapolation : Extrapolation Suppose we plot gallons of gas needed for 100 miles for different models of a military vehicle. The regression equation for this data is y=(-1/5000)x+7 If the cost of the vehicle is $35,000 will the number of gallons needed be 0 ?

Pearson’s Correlation Coefficient : Pearson’s Correlation Coefficient Quantity which measures the amount of correlation between x and y. Denoted by r. The formula for r Not on the AP Statistics formula sheet

Devore’s Formula : Devore’s Formula Formulas are the same algebraically Formula is on page 528-529 of Jay Devore, Probability and Statistics for Scientists and Engineers 5th edition.

Correlation does not imply causation : Correlation does not imply causation One thousand people are given a questionnaire and questions ask about their knowledge of politics and politicians and their health. It is found that knowledge of politics and politicians is correlated to better health. Does this imply that learning about politics and politicians improves ones health?

Coefficient of determination : Coefficient of determination The coefficient of determination is r2 The coefficient of determination measures the amount of variation in y explained by the variation in x. A coefficient of determination closer to 1 means that the regression line fits the data better. A coefficient of determination closer to 0 indicates that the regression line does not fit the data well.

Equations to find a Least Squares Regression Line : Equations to find a Least Squares Regression Line

Coefficient of determination : Coefficient of determination These formulas are in PSSE 5th edition pg. 506

Questionshttp://www.surveymonkey.com/s/HRMKHLX : Questionshttp://www.surveymonkey.com/s/HRMKHLX

Slide 25 :

Slide 26 :

Slide 27 :

Answers : Answers

Answer D: This is one of the main goals of regression analysis. : Answer D: This is one of the main goals of regression analysis.

Slide 30 : Answer C: A two dimensional line is defined by a slope parameter and an intercept parameter.

Slide 31 : Answer A: The line will slope up to the right.

Outliers and Influential Points : Outliers and Influential Points Outlier is a point which does not fit the general pattern of the data An influential point is one whose removal would have a large effect on the slope of the regression line In regression analysis these terms do not have a specific mathematical definition.

Influential Points : Influential Points We know that the regression line goes through One can think of the regression line as like a rod which is nailed to the wall through the point Points that are large or small in the x-direction are also farther from and will have more impact on the regression line

Outliers and Influential Points : Outliers and Influential Points Outlier and influential point Although this point doesn’t fit the pattern of the data it isn’t as influential because it’s x-value is close to the mean

Outliers and Influential Points : Outliers and Influential Points Point I is more influential than point II because its x-value is farther from the mean Would point I be considered an outlier? Not necessarily. There is no specific mathematical definition in the regression analysis.

Fathom Simulations : Fathom Simulations A software package called fathom can be used to do educational data analysis simulations http://www.keypress.com/x5656.xml

Residual Plots : Residual Plots Definition of residual Residual = actual – observed Residual = Keep in mind that the order makes a difference (the sign should be correct) In a residual plot residuals versus the x-values are graphed

Residual Plots : Residual Plots If the residual plot indicates that the residuals do not show any definite pattern and are randomly distributed around 0 then this is consistent with the idea that the model is a good fit If the residual plots indicates a definite pattern than a non-linear fit may be more appropriate.

Examples of Residual Plots : Examples of Residual Plots Definite pattern non-linear fit No significant pattern

Weight vs. Height Example : Weight vs. Height Example

Slide 41 :

Slide 42 :

Data Transformations : Data Transformations If the residual plot indicates a non-linear relationship a data transformation may be appropriate

Example of a data transformation : Example of a data transformation Number of Internet users over time Source: http://www.internetworldstats.com/emarketing.htm

Slide 45 :

Could fit a line : Could fit a line Correlation coefficient: r=0.9795419

Could do a quadratic transformation : Could do a quadratic transformation Correlation coefficient: r=0.9938617 Residual plot looks a little bit more random as well

Preparation and background in regression : Preparation and background in regression Was taught the material at UVa. back in fall 2000 (covered in Jay Devore’s book (PSSE) chapter 12) Relearned the material from Duane Hinder’s book 5 Steps to a 5: AP Statistics (covered in chapter 7: Two variable data analysis) For this lecture went over Martin Sternstein’s book where it is covered in Topic 4: Exploring Bivariate Data

Martin Sternstein’s Exploring Bivariate Data Multiple-choice questions : Martin Sternstein’s Exploring Bivariate Data Multiple-choice questions Did all 31 questions on pages 88-98

Multiple-choice Review : Multiple-choice Review Difficult questions: 4,6 and 27 Sternstein’s explanations can be somewhat terse, but the problems can be helpful in preparing for the exam. In question 3 Sternstein means log base 10 and not the natural log. I missed 5 questions but the others I got correct.

Calculator : Calculator Sternstein’s book does not cover the use of a calculator Can gain this knowledge from the calculator manual TI-84 is one of the most popular calculators for AP Statistics (and for AP Calculus), and Hinders covers use of this calculator in his book.

Next week we can cover probability which Sternstein covers in topic nine and Hinders covers in chapters 9 and 10 : Next week we can cover probability which Sternstein covers in topic nine and Hinders covers in chapters 9 and 10

Questions http://www.surveymonkey.com/s/H2MB982 : Questions http://www.surveymonkey.com/s/H2MB982 Answers will then be posted along with student response performance on http://www.dkfriedman.name Questions will be closed Wednesday 03/03/2010 at 11:59 P.M. EDT

Hungry for more statistics, science and math? : Hungry for more statistics, science and math? Feel free to check out Khan Academy Many free videos on a wide range of subjects including statistics Links http://www.khanacademy.org/ http://www.pbs.org/newshour/bb/north_america/jan-june10/khan_02-22.html

Want to learn?

Sign up and browse through relevant courses.

Name:
Your Email:
Password:
Country:
Contact no:


Area code Number
Subjects you are interested in:
Word verification: (Enter the text as in image)


Sign Up Already a member? Sign In
I agree to WizIQ's User Agreement & Privacy Policy
5 Members Recommend
2 Followers

Your Facebook Friends on WizIQ

Give live classes, create & sell online courses

Try it free Plans & Pricing

Connect