Predictive Representation Learning for Language Modeling

Qingfeng Lan; Luke Kumar; Martha White; Alona Fyshe

arXiv:2105.14214·cs.CL·June 1, 2021

Predictive Representation Learning for Language Modeling

Qingfeng Lan, Luke Kumar, Martha White, Alona Fyshe

PDF

Open Access

TL;DR

This paper introduces Predictive Representation Learning (PRL), a method that explicitly trains LSTMs to encode specific predictions, improving language modeling performance, convergence speed, and data efficiency.

Contribution

It proposes a novel PRL approach that constrains LSTMs to encode explicit predictions, enhancing language model effectiveness compared to traditional methods.

Findings

01

PRL significantly improves language modeling results

02

PRL leads to faster convergence of models

03

PRL performs better with limited data

Abstract

To effectively perform the task of next-word prediction, long short-term memory networks (LSTMs) must keep track of many types of information. Some information is directly related to the next word's identity, but some is more secondary (e.g. discourse-level features or features of downstream words). Correlates of secondary information appear in LSTM representations even though they are not part of an \emph{explicitly} supervised prediction task. In contrast, in reinforcement learning (RL), techniques that explicitly supervise representations to predict secondary information have been shown to be beneficial. Inspired by that success, we propose Predictive Representation Learning (PRL), which explicitly constrains LSTMs to encode specific predictions, like those that might need to be learned implicitly. We show that PRL 1) significantly improves two strong language modeling methods, 2)…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications

MethodsTanh Activation · Sigmoid Activation · Long Short-Term Memory