Language Modeling Teaches You More Syntax than Translation Does: Lessons Learned Through Auxiliary Task Analysis
Kelly W. Zhang, Samuel R. Bowman

TL;DR
This paper compares different pretraining objectives for LSTM models, finding that language modeling most effectively captures syntactic information, which benefits transfer learning, while also noting the surprising performance of randomly-initialized models on syntactic tasks.
Contribution
It provides a fair comparison of four pretraining objectives on syntactic learning, highlighting language modeling's superiority for syntax-related transfer learning.
Findings
Language models outperform other objectives in syntactic tasks.
Randomly-initialized LSTMs perform well on syntactic tasks with ample data.
Language modeling is the most effective pretraining task for syntax transfer learning.
Abstract
Recent work using auxiliary prediction task classifiers to investigate the properties of LSTM representations has begun to shed light on why pretrained representations, like ELMo (Peters et al., 2018) and CoVe (McCann et al., 2017), are so beneficial for neural language understanding models. We still, though, do not yet have a clear understanding of how the choice of pretraining objective affects the type of linguistic information that models learn. With this in mind, we compare four objectives---language modeling, translation, skip-thought, and autoencoding---on their ability to induce syntactic and part-of-speech information. We make a fair comparison between the tasks by holding constant the quantity and genre of the training data, as well as the LSTM architecture. We find that representations from language models consistently perform best on our syntactic auxiliary prediction tasks,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications
MethodsSigmoid Activation · Tanh Activation · GloVe Embeddings · Location-based Attention · Sequence to Sequence · Contextual Word Vectors · Bidirectional LSTM · Softmax · ELMo · Long Short-Term Memory
