Better Long-Range Dependency By Bootstrapping A Mutual Information Regularizer
Yanshuai Cao, Peng Xu

TL;DR
This paper introduces a novel regularizer based on mutual information estimation to enhance the learning of long-range dependencies in sequence data, leading to better language modeling and generation quality.
Contribution
It proposes a new regularizer that explicitly maximizes mutual information between sequence segments, improving long-range dependency modeling in language tasks.
Findings
Increases mutual information of sequence segments.
Leads to higher likelihood on holdout data.
Improves generation quality.
Abstract
In this work, we develop a novel regularizer to improve the learning of long-range dependency of sequence data. Applied on language modelling, our regularizer expresses the inductive bias that sequence variables should have high mutual information even though the model might not see abundant observations for complex long-range dependency. We show how the `next sentence prediction (classification)' heuristic can be derived in a principled way from our mutual information estimation framework, and be further extended to maximize the mutual information of sequence variables. The proposed approach not only is effective at increasing the mutual information of segments under the learned model but more importantly, leads to a higher likelihood on holdout data, and improved generation quality. Code is released at https://github.com/BorealisAI/BMI.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications
