Diffusion of Context and Credit Information in Markovian Models
Y. Bengio, P. Frasconi

TL;DR
This paper investigates how the ergodicity of transition matrices in Markov models affects long-term context learning, revealing that sparsity and near-deterministic transitions mitigate diffusion issues in gradient-based learning methods.
Contribution
It demonstrates that reducing ergodicity by making transition matrices sparse or near-deterministic improves long-term context propagation in Markovian models.
Findings
Sparse and near-deterministic transition matrices reduce diffusion of context.
Ergodicity issues hinder long-term context learning in Markov models.
Results apply to gradient-based learning algorithms like Baum-Welch.
Abstract
This paper studies the problem of ergodicity of transition probability matrices in Markovian models, such as hidden Markov models (HMMs), and how it makes very difficult the task of learning to represent long-term context for sequential data. This phenomenon hurts the forward propagation of long-term context information, as well as learning a hidden state representation to represent long-term context, which depends on propagating credit information backwards in time. Using results from Markov chain theory, we show that this problem of diffusion of context and credit is reduced when the transition probabilities approach 0 or 1, i.e., the transition probability matrices are sparse and the model essentially deterministic. The results found in this paper apply to learning approaches based on continuous optimization, such as gradient descent and the Baum-Welch algorithm.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Advanced Graph Neural Networks · Machine Learning and Algorithms
