Extending Expectation Propagation for Graphical Models : Extending Expectation Propagation for Graphical Models Yuan (Alan) Qi
Joint work with Tom Minka
Motivation : Motivation Graphical models are widely used in real-world applications, such as wireless communications and bioinformatics.
Inference techniques on graphical models often sacrifice efficiency for accuracy or sacrifice accuracy for efficiency.
Need a method that better balances the trade-off between accuracy and efficiency.
Motivation : Motivation Computational Time Error Current
Techniques
Outline : Outline Background on expectation propagation (EP)
Extending EP on Bayesian networks for dynamic systems
Poisson tracking
Signal detection for wireless communications
Tree-structured EP on loopy graphs
Conclusions and future work
Outline : Outline Background on expectation propagation (EP)
Extending EP on Bayesian networks for dynamic systems
Poisson tracking
Signal detection for wireless communications
Tree-structured EP on loopy graphs
Conclusions and future work
Graphical Models : Graphical Models
Inference on Graphical Models : Inference on Graphical Models Bayesian inference techniques:
Belief propagation (BP): Kalman filtering /smoothing, forward-backward algorithm
Monte Carlo: Particle filter/smoothers, MCMC
Loopy BP: typically efficient, but not accurate on general loopy graphs
Monte Carlo: accurate, but often not efficient
Expectation Propagation in a Nutshell : Expectation Propagation in a Nutshell Approximate a probability distribution by simpler parametric terms:
For directed graphs:
For undirected graphs:
Each approximation term lives in an exponential family (e.g. Gaussian)
EP in a Nutshell : EP in a Nutshell The approximate term minimizes the following KL divergence by moment matching: Where the leave-one-out approximation is
Limitations of Plain EP : Limitations of Plain EP Can be difficult or expensive to analytically compute the needed moments in order to minimize the desired KL divergence.
Can be expensive to compute and maintain a valid approximation distribution q(x), which is coherent under marginalization.
Tree-structured q(x):
Three Extensions : Three Extensions 1. Instead of choosing the approximate term to minimize the following KL divergence: use other criteria. 2. Use numerical approximation to compute moments: Quadrature or Monte Carlo. 3. Allow the tree-structured q(x) to be non-coherent during the iterations. It only needs to be coherent in the end.
Efficiency vs. Accuracy : Efficiency vs. Accuracy Computational Time Error Extended EP ? Monte Carlo Loopy BP (Factorized EP)
Outline : Outline Background on expectation propagation (EP)
Extending EP on Bayesian networks for dynamic systems
Poisson tracking
Signal detection for wireless communications
Tree-structured EP on loopy graphs
Conclusions and future work
Object Tracking : Object Tracking Guess the position of an object given noisy observations Object
Bayesian Network : Bayesian Network (random walk) e.g. want distribution of x’s given y’s
Approximation : Approximation Factorized and Gaussian in x
Message Interpretation : Message Interpretation = (forward msg)(observation msg)(backward msg) Forward Message Backward Message Observation Message
Extensions of EP : Extensions of EP Instead of matching moments, use any method for approximate filtering.
Examples: statistical linearization, unscented Kalman filter (UKF), mixture of Kalman filters
Turn any deterministic filtering method into a smoothing method!
All methods can be interpreted as finding linear/Gaussian approximations to original terms.
Use quadrature or Monte Carlo for term approximations
Example: Poisson Tracking : Example: Poisson Tracking is an integer valued Poisson variate with mean
Poisson Tracking Model : Poisson Tracking Model
Extended EP vs. Monte Carlo: Accuracy : Extended EP vs. Monte Carlo: Accuracy Variance Mean
Accuracy/Efficiency Tradeoff : Accuracy/Efficiency Tradeoff
Bayesian network for Wireless Signal Detection : Bayesian network for Wireless Signal Detection si: Transmitted signals
xi: Channel coefficients for digital wireless communications
yi: Received noisy observations
Extended-EP Joint Signal Detection and Channel Estimation : Extended-EP Joint Signal Detection and Channel Estimation Turn mixture of Kalman filters into a smoothing method
Smoothing over the last observations
Observations before act as prior for the current estimation
Computational Complexity : Computational Complexity Expectation propagation O(nLd2)
Stochastic mixture of Kalman filters O(LMd2)
Rao-blackwellised particle smoothers O(LMNd2)
n: Number of EP iterations (Typically, 4 or 5)
d: Dimension of the parameter vector
L: Smooth window length
M: Number of samples in filtering (Often larger than 500)
N: Number of samples in smoothing (Larger than 50)
EP is about 5,000 times faster than Rao-blackwellised particle smoothers.
Experimental Results : Experimental Results EP outperforms particle smoothers in efficiency with comparable accuracy. (Chen, Wang, Liu 2000) Signal-Noise-Ratio Signal-Noise-Ratio
Bayesian Networks for Adaptive Decoding : Bayesian Networks for Adaptive Decoding The information bits et are coded by a convolutional error-correcting encoder.
EP Outperforms Viterbi Decoding : EP Outperforms Viterbi Decoding Signal-Noise-Ratio
Outline : Outline Background on expectation propagation (EP)
Extending EP on Bayesian networks for dynamic systems
Poisson tracking
Signal detection for wireless communications
Tree-structured EP on loopy graphs
Conclusions and future work
Inference on Loopy Graphs : Inference on Loopy Graphs Problem: estimate marginal distributions of the variables indexed by the nodes in a loopy graph, e.g., p(xi), i = 1, . . . , 16.
4-node Loopy Graph : 4-node Loopy Graph Joint distribution is product of pairwise potentials for all edges: Want to approximate by a simpler distribution
BP vs. TreeEP : BP vs. TreeEP BP TreeEP
Junction Tree Representation : Junction Tree Representation p(x) q(x) Junction tree p(x) q(x) Junction tree
Two Kinds of Edges : Two Kinds of Edges On-tree edges, e.g., (x1,x4): exactly incorporated into the junction tree
Off-tree edges, e.g., (x1,x2): approximated by projecting them onto the tree structure
KL Minimization : KL Minimization KL minimization moment matching
Match single and pairwise marginals of
Reduces to exact inference on single loops
Use cutset conditioning and
Matching Marginals on Graph : Matching Marginals on Graph (1) Incorporate edge (x3 x4) (2) Incorporate edge (x6 x7)
Drawbacks of Global Propagation : Drawbacks of Global Propagation Update all the cliques even when only incorporating one off-tree edge
Computationally expensive
Store each off-tree data message as a whole tree
Require large memory size
Solution: Local Propagation : Solution: Local Propagation Allow q(x) be non-coherent during the iterations. It only needs to be coherent in the end.
Exploit the junction tree representation: only locally propagate information within the minimal loop (subtree) that is directly connected to the off-tree edge.
Reduce computational complexity
Save memory
Slide39 : (1) Incorporate edge(x3 x4) (3) Incorporate edge (x6 x7) (2) Propagate evidence On this simple graph, local propagation runs roughly 2 times faster and uses 2 times less memory to store messages than plain EP
New Interpretation of TreeEP : New Interpretation of TreeEP Marry EP with Junction algorithm
Can perform efficiently over hypertrees and hypernodes
4-node Graph : 4-node Graph TreeEP = the proposed method
GBP = generalized belief propagation on triangles
TreeVB = variational tree
BP = loopy belief propagation = Factorized EP
MF = mean-field
Fully-connected graphs : Fully-connected graphs Results are averaged over 10 graphs with randomly generated potentials
TreeEP performs the same or better than all other methods in both accuracy and efficiency!
8x8 grids, 10 trials : 8x8 grids, 10 trials
TreeEP versus BP and GBP : TreeEP versus BP and GBP TreeEP is always more accurate than BP and is often faster
TreeEP is much more efficient than GBP and more accurate on some problems
TreeEP converges more often than BP and GBP
Outline : Outline Background on expectation propagation (EP)
Extending EP on Bayesian networks for dynamic systems
Poisson tracking
Signal detection for wireless communications
Tree-structured EP on loopy graphs
Conclusions and future work
Conclusions : Conclusions Extend EP on graphical models:
Instead of minimizing KL divergence, use other sensible criteria to generate messages. Effectively turn any deterministic filtering method into a smoothing method.
Use quadrature to approximate messages.
Local propagation to save the computation and memory in tree structured EP.
Conclusions : Conclusions Extended EP algorithms outperform state-of-art inference methods on graphical models in the trade-off between accuracy and efficiency Computational Time Error State-of-art
Techniques
Future Work : Future Work More extensions of EP:
How to choose a sensible approximation family (e.g. which tree structure)
More flexible approximation: mixture of EP?
Error bound?
Bayesian conditional random fields
EP for optimization (generalize max-product)
More real-world applications, e.g., classification of gene expression data.
Classifying Colon Cancer Data by Predictive Automatic Relevance Determination : Classifying Colon Cancer Data by Predictive Automatic Relevance Determination The task: distinguish normal and cancer samples
The dataset: 22 normal and 40 cancer samples with 2000 features per sample.
The dataset was randomly split 100 times into 50 training and 12 testing samples.
SVM results from Li et al. 2002
End : End Contact information:
yuanqi@media.mit.edu