6.041/6.431 24. Classical Statistical Inference - II
Review • Maximum likelihood estimation – Have model with unknown parameters: X ∼ pX(x; θ) – Pick θ that “makes data most likely” max pX(x; θ) θ – Compare to Bayesian MAP estimation: pΘ(x|θ)pΘ(θ)X max pΘ (X θ x) or max |θ ||θ pY (y) • Sample mean estimate of θ = E[X] Θˆ n =(X1 +···+ Xn)/n • 1 −α confidence interval + P( Θˆ −n ≤ θ ≤ Θˆ n ) ≥ 1 −α, ∀ θ• confidence interval for sample mean – let z be s.t. Φ(z)=1− α/2 �zσ zσ P Θˆ ˆn − √ Θ +n ≤ θ ≤ n√ n �≈ 1 −α Linear regression • Model y ≈ θ0 + θ1x n min 2 (yi θ0 θ1xi)θ0,θ1 i�=1 −−• Solution (set derivatives to zero): x1 + ···+ xn y1 + = , y = ···+ ynxn n n (xi=1i x)(yi y)θˆ1 = � �n −−(xi=1i −x)2 θˆ 0 = y −θˆ1x • Interpretation of the form of the solution – Assume a model Y = θ0 + θ1X + W W independent of X and Y , with zero mean – Check that cov(X, Y ) E �(X E −[X])(Y −E[Y ]) θ1 == �var(X) E �(X −E[X])2 �– Solution formula for θˆ1 is a natural estimate of the covariance Regression = n 2 min �(yi −θ0 −θ1xi)(∗) θ0,θ1 i=1 • One interpretation: 2Yi = θ0 + θ1xi + Wi, Wi ∼ N(0,σ), i.i.d. – Likelihood function f(;)is:X,Y θx, y θ | n 1c exp � · − �(yi −θ0 −2 θ 1xi)2σ2�i=1 – Take logs, same as (*) – Least sq. ↔ pretend Wi i.i.d. normal • Data: (x1,y1), (x2,y2),..., (xn,yn) • Model: y ≈ θ0 + θ1x LECTURE 24 • Reference: Section 9.3 Outline • Review – Maximum likelihood estimation – Confidence intervals • Linear regression • Binary hypothesis testing – Types of error – Likelihood ratio test (LRT) Likelihood ratio test (LRT) • Bayesian case (MAP rule): choose H1 if: P(H1 | X = x) > P(H0 | X = x) or P(X = x | H1)P(H1) P(X = x | H0)P(H0) > P(X = x) P(X = x) or P(X = x | H1) P(H1) > P(X = x | H0) P(H0) (likelihood ratio test) • Nonbayesian version: choose H1 if P(X = x; H1) >ξ (discrete case) P(X = x; H0) fX(x; H1) >ξ (continuous case) fX(x; H0) • threshold ξ trades off the two types of error – choose ξ so that P(reject H0; H0)= α (e.g., α =0.05) The world of linear regression • Multiple linear regression: – data: ( xi,x,xi,yii), i =1,...,n – model: y ≈ θ0 + θx+ θx+ θx– formulation: n min θx � (yi − θ0 − θxi −iθ,θi=1−θx 2 )iθ, • Choosing the right variables – model y ≈ θ0 + θ1h(x) e.g., y ≈ θ0 + 2 θ1x– work with data points (yi,h(x)) – formulation: n 2θ− min θ i�(yi =1 − 0 θ1h1(xi))The world of regression (ctd.) • In practice, one also reports – Confidence intervals for the θi – “Standard error” (estimate of σ) – 2R, a measure of “explanatory power” • Some common concerns – Heteroskedasticity – Multicollinearity – Sometimes misused to conclude causal relations – etc. Binary hypothesis testing • Binary θ; new terminology: – null hypothesis H0: X ∼ pX(x; H0) [orfX (x; H0)] – alternative hypothesis H1: X ∼ pX(x; H1) [orfX (x; H1)] • Partition the space of possible data vectors Rejection region R: reject H0 iff data ∈ R • Types of errors: –TypeI ( false rejection, false alarm): H0 true, but rejected α(R)= P(X ∈ R; H0) – Type II (false acceptance, missed detection): H0 false, but accepted β(R)= P(X ∈ R; H1) MIT OpenCourseWare http://ocw.mit.edu 6.041 /6.431 Probabilistic Systems Analysis and Applied Probability Fall 2010 For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.
Description
In this lecture notes we are going to continue with Classical Statistical Inference - II. This lesson reviews 1. Maximum likelihood estimation and 2. Confidence intervals . This lecture notes explores Linear regression , Binary hypothesis testing, Types of error and Likelihood ratio test (LRT).
Instructors: Prof.Dimitri Bertsekas, Prof. John Tsitsiklis, MIT Course Number: 6.041 / 6.431 Level: Undergraduate / Graduate , 6.041 / 6.431 24. Classical statistical inference - II, Probabilistic Systems Analysis and Applied Probability, Electrical Engineering and Computer Science, Engineering, Massachusetts Institute of Technology: MIT Open Course Ware, http://ocw.mit.edu (11-11-2011). License: Creative Commons BY-NC-SA: http://ocw.mit.edu/terms/#cc.
Presentation Transcript
Your Facebook Friends on WizIQ