MIT Press, 1998. — 204 p.
A variety of problems in machine learning and digital communication deal with complex but structured natural or artificial systems. Natural patterns that we wish to automatically classify are a consequence of a hierarchical causal physical process. Learning about the world in which we live requires that we extract useful sensory features and abstract concepts and then form a model for how these interact. Universal data compression involves estimating the probability distribution of a data source, which is often produced by some natural hierarchical process. Error-correcting codes used in telephone modems and deep-space communication consist of electrical signals that are linked together in a complex fashion determined by the designed code and the physical nature of the communication channel. Not only are these tasks characterized by complex structure, but they also contain random elements. Graphical models such as Bayesian belief networks and Markov random fields provide a way to describe the relationships between random variables in a complex stochastic system.
In this book, I use graphical models as an overarching framework to describe and solve problems in the areas of pattern classification, unsupervised learning, data compression, and channel coding. This book covers research I did while in my doctoral program at the University of Toronto. Rather than being a textbook, this book is a treatise that covers several leading-edge areas of research in machine learning and digital communication.
The book begins with a review of graphical models and algorithms for inferring probabilities in these models, including the probability propagation algorithm, Markov chain Monte Carlo (Gibbs sampling), variational inference, and Helmholtz machines. I then turn to the practical problem of learning models for pattern classification. Results on the classification of handwritten digits show that Bayesian network pattern classifiers outperform other standard methods, such as the k-nearest neighbor method.
In the area of unsupervised learning, I show that with just a simple local delta-rule learning algorithm (the "wake-sleep" algorithm) Bayesian network - neural network hybrids can learn hierarchical structure from images. When Bayesian networks with hidden variables are used as source models for data compression, an exponentially large number of codewords are associated with each input pattern. However, it turns out that the code can still be
used efficiently, if a new technique called "bits-back coding" is used.
The current best error-correcting decoding algorithms (e.g., turbodecoding) are instances of "probability propagation" in various graphical models. These new schemes are rapidly closing the gap between the performances of practical channel coding systems and Shannon's fifty-year-old channel coding limit. The graphical model framework exposes the similarities between these codes and leads the way to a new class of "trellis-constrained codes" which also operate close to Shannon's limit.
The book concludes with suggestions by myself and other experts for future directions of research in graphical models for machine learning and digital communication.
Probabilistic Inference in Graphical Models
Pattern Classification
Unsupervised Learning
Data Compression
Channel Coding
Future Research Directions