LSTMs Exploit Linguistic Attributes of Data

Nelson F. Liu; Omer Levy; Roy Schwartz; Chenhao Tan; and Noah A. Smith

arXiv:1805.11653·cs.CL·April 9, 2019

LSTMs Exploit Linguistic Attributes of Data

Nelson F. Liu, Omer Levy, Roy Schwartz, Chenhao Tan, and Noah A. Smith

PDF

TL;DR

This paper demonstrates that LSTMs trained on natural language data can recall longer sequences and utilize specific neurons for counting, highlighting the influence of linguistic structure on learning capabilities.

Contribution

It reveals how linguistic attributes of data enhance LSTM memorization and identifies neuron specialization for counting, advancing understanding of LSTM learning mechanisms.

Findings

01

LSTMs trained on natural language recall longer sequences.

02

LSTMs use dedicated neurons for counting input timesteps.

03

Natural language structure aids LSTM learning efficiency.

Abstract

While recurrent neural networks have found success in a variety of natural language processing applications, they are general models of sequential data. We investigate how the properties of natural language data affect an LSTM's ability to learn a nonlinguistic task: recalling elements from its input. We find that models trained on natural language data are able to recall tokens from much longer sequences than models trained on non-language sequential data. Furthermore, we show that the LSTM learns to solve the memorization task by explicitly using a subset of its neurons to count timesteps in the input. We hypothesize that the patterns and structure in natural language data enable LSTMs to learn by providing approximate ways of reducing loss, but understanding the effect of different training data on the learnability of LSTMs remains an open question.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsSigmoid Activation · Tanh Activation · Long Short-Term Memory