Better Long-Range Dependency By Bootstrapping A Mutual Information   Regularizer

Yanshuai Cao; Peng Xu

arXiv:1905.11978·cs.LG·February 25, 2020·1 cites

Better Long-Range Dependency By Bootstrapping A Mutual Information Regularizer

Yanshuai Cao, Peng Xu

PDF

Open Access 1 Repo

TL;DR

This paper introduces a novel regularizer based on mutual information estimation to enhance the learning of long-range dependencies in sequence data, leading to better language modeling and generation quality.

Contribution

It proposes a new regularizer that explicitly maximizes mutual information between sequence segments, improving long-range dependency modeling in language tasks.

Findings

01

Increases mutual information of sequence segments.

02

Leads to higher likelihood on holdout data.

03

Improves generation quality.

Abstract

In this work, we develop a novel regularizer to improve the learning of long-range dependency of sequence data. Applied on language modelling, our regularizer expresses the inductive bias that sequence variables should have high mutual information even though the model might not see abundant observations for complex long-range dependency. We show how the `next sentence prediction (classification)' heuristic can be derived in a principled way from our mutual information estimation framework, and be further extended to maximize the mutual information of sequence variables. The proposed approach not only is effective at increasing the mutual information of segments under the learned model but more importantly, leads to a higher likelihood on holdout data, and improved generation quality. Code is released at https://github.com/BorealisAI/BMI.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

BorealisAI/BMI
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications