Dissecting Contextual Word Embeddings: Architecture and Representation

Matthew E. Peters; Mark Neumann; Luke Zettlemoyer; Wen-tau Yih

arXiv:1808.08949·cs.CL·October 1, 2018·6 cites

Dissecting Contextual Word Embeddings: Architecture and Representation

Matthew E. Peters, Mark Neumann, Luke Zettlemoyer, Wen-tau Yih

PDF

Open Access

TL;DR

This paper empirically compares different neural architectures for contextual word embeddings, revealing how they influence task performance and the nature of learned linguistic representations across layers.

Contribution

It provides a comprehensive analysis of how architecture choices affect the quality and properties of contextual embeddings in NLP tasks.

Findings

01

All architectures outperform static word embeddings.

02

Representations evolve from morphological to semantic with depth.

03

Tradeoff exists between model speed and accuracy.

Abstract

Contextual word representations derived from pre-trained bidirectional language models (biLMs) have recently been shown to provide significant improvements to the state of the art for a wide range of NLP tasks. However, many questions remain as to how and why these models are so effective. In this paper, we present a detailed empirical study of how the choice of neural architecture (e.g. LSTM, CNN, or self attention) influences both end task accuracy and qualitative properties of the representations that are learned. We show there is a tradeoff between speed and accuracy, but all architectures learn high quality contextual representations that outperform word embeddings for four challenging NLP tasks. Additionally, all architectures learn representations that vary with network depth, from exclusively morphological based at the word embedding layer through local syntax based in the lower…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings · Sigmoid Activation · Tanh Activation · Long Short-Term Memory