Language Modeling Teaches You More Syntax than Translation Does: Lessons   Learned Through Auxiliary Task Analysis

Kelly W. Zhang; Samuel R. Bowman

arXiv:1809.10040·cs.CL·January 8, 2019·41 cites

Language Modeling Teaches You More Syntax than Translation Does: Lessons Learned Through Auxiliary Task Analysis

Kelly W. Zhang, Samuel R. Bowman

PDF

Open Access

TL;DR

This paper compares different pretraining objectives for LSTM models, finding that language modeling most effectively captures syntactic information, which benefits transfer learning, while also noting the surprising performance of randomly-initialized models on syntactic tasks.

Contribution

It provides a fair comparison of four pretraining objectives on syntactic learning, highlighting language modeling's superiority for syntax-related transfer learning.

Findings

01

Language models outperform other objectives in syntactic tasks.

02

Randomly-initialized LSTMs perform well on syntactic tasks with ample data.

03

Language modeling is the most effective pretraining task for syntax transfer learning.

Abstract

Recent work using auxiliary prediction task classifiers to investigate the properties of LSTM representations has begun to shed light on why pretrained representations, like ELMo (Peters et al., 2018) and CoVe (McCann et al., 2017), are so beneficial for neural language understanding models. We still, though, do not yet have a clear understanding of how the choice of pretraining objective affects the type of linguistic information that models learn. With this in mind, we compare four objectives---language modeling, translation, skip-thought, and autoencoding---on their ability to induce syntactic and part-of-speech information. We make a fair comparison between the tasks by holding constant the quantity and genre of the training data, as well as the LSTM architecture. We find that representations from language models consistently perform best on our syntactic auxiliary prediction tasks,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications

MethodsSigmoid Activation · Tanh Activation · GloVe Embeddings · Location-based Attention · Sequence to Sequence · Contextual Word Vectors · Bidirectional LSTM · Softmax · ELMo · Long Short-Term Memory