6.441-16 Differential entropy, maximizing entropy
LECTURE 16Last time:Data compression • Coding theorem • Joint source and channel coding theorem • Converse • Robustness • Brain teaser • Lecture outlineDifferential entropy • Entropy rate and Burg’s theorem • AEP • Reading: Chapters 9, 11. Continuous random variablesWe consider continuous random variables with probability density functions (pdfs) X has pdf fX(x) Cumulative distribution function (CDF) FX(x)= P (X ≤ x)= � x fX(t)dt∞ pdfs are not probabilities and may be greater than 1 in particular for a discrete Z PαZ(αz)= PZ(z) but for continuous X P (αX ≤ x)= P (X ≤ x)= FX �x �� αx fX(t)dtαα −∞ so fαX(x)= dFX (x)= 1fX �x � dx αα Continuous random variablesIn general, for Y = g(X) Get CDF of Y : FY (y)= P(Y ≤ y) Differenttiat to get dFYfY (y)= (y)dy X: uniform on [0,2] Find pdf of Y = X3 Solution: FY (y)= P(Y ≤ y)= P(X3 ≤ y) (1) = P(X ≤ y 1/3)= 1 y 1/3 (2)2dFY 1 fY (y)= dy (y)= 6y2/3 Differential entropy Differential entropy: � +∞ � 1 � h(X)= fX(x) ln dx (3) −∞ fX(x) All definitions follow as before replacing PX with fX and summation with integration I(X; Y )� +∞ � +∞ � fX,Y (x, y) �= fX,Y (x, y) ln dxdy −∞ −∞ fX(x)fY (y)= D �fX,Y (x, y)||fX(x)fY (y)�= h(Y ) − h(Y |X)= h(X) − h(X|Y )Joint entropy is defined as h(Xn)= − � fXn(xn)ln �fXn(xn)� dx1 . . . dxn Differential entropy The chain rules still hold: h(X, Y )= h(X)+ h(YX)= h(Y )+ h(XY )||I((X, Y ); Z)= I(X; Z)+ I(Y ; ZY )|K-L distance D(fX(x)||fY (y)) = � fX(x) ln �fX (x)� fY (y) still remains non-negative in all cases Conditioning still reduces entropy, because differential entropy is concave in the input (Jensen’s inequality) Let f(x)= −x ln(x) then 1 f �(x)= −xx − ln(x) = − ln(x) − 1 and 1 f”(x)= − < 0 x for x> 0. Hence I(X; Y )= h(Y ) − h(Y |X) ≥ 0 Differential entropy H(X) ≥ 0 always and H(X)=0 for X a constant Let us consider h(X) for X constant For X constant fX(x)= δ(x) � +∞ � 1 � h(X)= fX(x) ln dx (4) −∞ fX(x) h(X) → −∞ Differential entropy is not always positive See 9.3 for discussion of relation between discrete and differential entropy Entropy under a transformation: h(X + c)= h(X) h(αX)= h(X)+ ln (α||) Maximizing entropy For H(Z), the uniform distribution maximiize entropy, yielding log(|Z|) The only constraint we had then was that the inputs be selected from the set Z We now seek a fX(x) that maximizes h(X) subject to some set of constraints fX(x) ≥ 0 � fX(x)dx =1 � fX(x)ri(x)dx = αi where {(r1,α1),..., (rm, αm)}is a set of constraints on X λ0−1+�m Let us consider fX(x)= ei=1 λiri(x). Let us show it achieves a maximum entropy �����Maximizing entropyConsider some other random variable Y with fy(y) pdf that satisfies the conditions but is not of the above form h(Y )= −fY (x) ln(fY (x))dx �fY (x) � fX(x) fY (x) lnfX(x) dx= − �()ln( ())ffdxxxYX−D(fY ||fX) − fY (x) ln(fX(x))dx =≤−⎛⎝λ0 − 1+⎞⎠m� m� i=1 fY (x)λiri(x)dx= − ⎛⎝λ0 − 1+⎞⎠fX(x)λiri(x)dx= − i=1= h(X) Special case: for a given variance, a Gaussiia distribution maximizes entropy For X ∼ N(0,σ2), h(X) = 1 ln(2πeσ2)2 Entropy rate and Burg’s theorem The differential entropy rate of a stochastic process {Xi} is defined to be limn→∞ h(Xn) n if it exists In the case of a stationary process, we can show that the differential entropy rate is limn→∞ h(Xn|Xn−1) The maximum entropy rate stochastic procees {Xi} satisfying the constraints E �XiXi+k � = αk, k =0, 1,...,p, ∀i is the pth order Gauss-Markov process of the form pXi = − � akXi−k +Ξi k=1 where the Ξis are IID ∼ N(0,σ2), independeen of past Xs and a1,a2, . . . , ap, σ2 are chosen to satisfy the constraints In particular, let X1, . . . , Xn satisfy the constraaint and let Z1, . . . , Zn be a Gaussian process with the same covariance matrix as X1, . . . , Xn. The entropy of Zn is at least as great as that of Xn . Entropy rate and Burg’s theoremFacts about Gaussians: -we can always find a Gaussian with any arbitrary autocorrelation function -for two jointly Gaussian random variables X and Y with an arbitrary covariance, we can always express Y = AX + Z for some matrix A and Z independent of X -if Y and X are jointly Gaussian random variables and Y = X + Z then Z must also be -a Gaussian random vector Xn has pdf 1 fXn(xn)= �√2π|ΛXn|�n e−1(xn−µXn)T Λ−1(xn−µXn)2Xnwhere Λ and µ denote autocovariance and mean, respectively -The entropy is h(Xn)= 12ln �(2πe)n|ΛXn|� Entropy rate and Burg’s theorem The constraints E �XiXi+k � = αk, k =0, 1,...,p, ∀i can be viewed as an autocorrelation constrrain By selecting the ais according to the Yule-Walker equations, that give p+1 equations ion p + 1 unknowns R(0) = − �pk=1 akR(−k)+ σ2 R(l)= − �pk=1 akR(l − k) (recall that R(k)= R(−k)) we can solve for a1,a2, . . . , ap, σ2 What is the entropy rate? nh(Xn)= � h(Xi|Xi−1)i=1n= � h(XiXi−1)|i−p i=1 n= � h(Ξi) i=1 AEP WLLN still holds:− 1 ln �fXn(xn)� →−E[ln(fX(x))] = h(X)n in probability for Xis IID Define V ol(A)= �A dx1 . . . dxn Define the typical set A(�n) as: �(x1, . . . , xn)s.t.|− 1 ln �fXn(xn)� − h(X)|≤ �� n By the WLLN, P (A� (n)) > 1 − � for n large enough � � AEP 1= fXn(xn)dx1 . . . dxn 1 ≥ � (n) e−n(h(X)+�)dx1 . . . dxn A� en(h(X)+�) ≥ V ol(A� (n)) For n large enough, P (A(�n)) > 1 − � so 1 − � ≤ � (n) fXn(xn)dx1 . . . dxn A� 1 − � ≤ � A(n) e−n(h(X)−�)dx1 . . . dxn 1 − � ≤ V ol(A(�n))e−n(h(X)−�) MIT OpenCourseWarehttp://ocw.mit.edu 6.441 Information Theory Spring 2010 For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.
Description
This lecture explores ,Differential entropy,Entropy rate and Burg’s theorem
and AEP.Continuous random variables, Here we discussed about relation between discrete and differential entropy.Entropy rate and Burg’s theorem .There are some facts about Facts about Gaussians: we can always find a Gaussian with any arbitrary autocorrelation function.
Instructors: Prof. Muriel Médard, MIT Course Number:6.441 Level: Graduate, 6.441-16 Differential entropy, maximizing entropy, 6.441 Information Theory, Electrical Engineering and Computer Science, Engineering, Massachusetts Institute of Technology: MIT Open Course Ware, http://ocw.mit.edu (11-09-2011).License: Creative Commons BY-NC-SA: http://ocw.mit.edu/terms/#cc".
Presentation Transcript
Your Facebook Friends on WizIQ