Correlation and Linear Regression
CORRELATIONANDREGRESSION.Page1of2CORRELATION.Correlationconcernstherelationshipbetweentworandomvariables.Pairsofobservationsofsuchvariablesproduceabi-variatedistribution.Weareinterestedinthelinearcorrelationbetweenthetwovariables.Correlationmaybepositive(inwhichcase,bothvariablesincreasetogether)oritmaybenegative(inwhichcase,onedecreaseswhentheotherincreases).Ifnopatterncanbeseenthentheremaybenocorrelationbetweenthetwovariables.Generallyifwewishtoinvestigatethecorrelationbetweentwovariables,wewillfirstplotascatterdiagram.Wecanthenlookatthediagramtoseeifthereareanyroguevalues(outliers).Thesemayneedtobeinvestigatedandpossiblydiscounted(removed)butthispartofthetopicisnotpartofyoursyllabus.CorrelationismeasuredbytheProduct-momentcorrelationcoefficient(PMCC)‘r’,whichisgenerallyjustcalledthecorrelationcoefficient.yyxxxySSSr.(WherenyxyxSniiniiiniixy111nxxSniiinixx1221)(andnyySniiiniyy1221)(Thesemaylookcomplicatedbutifexpressedintermsoftabulatedvalues,theyareeasytoseeandcalculateastheexamplebelowdemonstrates:ixiy2ix2iyiiyxniix1=(A)niiy1=(B)niix12=(C)niix12=(D)niiiyx1=(E)Intermsofthecolumntotals(A),(B),(C)…(E).Sxy=(E)–nBA))((,Sxx=(C)–nA2)(,Syy=(D)-nB2)(CORRELATIONANDREGRESSION.Page2of2Thecorrelationcoefficientwillalwaysbetween-1and+1,-1r+1.Perfectnegativecorrelationhasr=-1.Perfectpositivecorrelationhasr=+1.Ifr=0,thereisnocorrelationatall.Generallyifriscloseto+1or-1,thencorrelationisconsideredtobehighandifcloseto0,(eithernegativeorpositive)correlationisconsideredtobelow.Itmayoftenbeusefultousecodingonthexandyvalueswhenmeasuringcorrelationasthismaysimplifythevaluesbeinganalysedandhasnoteffectatalonthefinalvalueofr.REGRESSION.Ifweareinvestigatingtherelationshipbetweentworandomvariablesandifwefindthatthereisastrongrelationshipbetweenthem,thenwemaytrytofindalaworequationthatlinksthetwovalues.Wewillonlybelookingforalinearequationandthestraightlineobtainediscalledtheregressionline.Theregressionlineofyonxisy=a+bxwherexiscalledtheexplanatory(orindependent)variableandyiscalledtheresponse(ordependent)variable,becausetheyvalueDEPENDSonthevalueofthexvalue.TheexplanatoryvalueisalwaysplottedHORIZONTALLYandtheRESPONSEvariableplottedVERTICALLY.Thefullnameoftheregressionlineistheleastsquaresregressionlineandisknownasthelineofbestfit.,y=a+bx,aistheyinterceptandbisthegradientoftheline.Thevaluesofbandaarefoundfromthefollowingresults.xxxySSbandxbya,wherexandyarethemeansofthexandydatarespectively.Itispossibletousetheregressionlinetomakeestimatesofthemeanvalueoftheresponsevariable(y)foranyvalueoftheexplanatoryvariablewithintherangeofthedata.ThisiscalledINTERPOLATION.N.Byoumustrememberthatyoudonotknowwhathappensoutsidetherangeofthedatasopredictingvaluesoftheresponsevariableoutsidethedatarange,calledEXTRAPOLATIONisunreliableandmustbeviewedwithcaution.
Description
Notes that introduce and explain correlation and linear Regression
Presentation Transcript
Your Facebook Friends on WizIQ