Word Interdependence Exposes How LSTMs Compose Representations
Naomi Saphra, Adam Lopez

TL;DR
This paper introduces a new measure of word interdependence in LSTM models to understand how hierarchical language representations are formed, revealing that high interdependence can impair generalization and that hierarchical structures depend on effective child representations.
Contribution
The paper proposes a novel measure of word interdependence in LSTMs and demonstrates its effectiveness through synthetic and real language data experiments.
Findings
High interdependence can hinder generalization in synthetic data.
Hierarchical structures are learned through effective child representations.
Interdependence is higher for syntactically linked word pairs in English.
Abstract
Recent work in NLP shows that LSTM language models capture compositional structure in language data. For a closer look at how these representations are composed hierarchically, we present a novel measure of interdependence between word meanings in an LSTM, based on their interactions at the internal gates. To explore how compositional representations arise over training, we conduct simple experiments on synthetic data, which illustrate our measure by showing how high interdependence can hurt generalization. These synthetic experiments also illustrate a specific hypothesis about how hierarchical structures are discovered over the course of training: that parent constituents rely on effective representations of their children, rather than on learning long-range relations independently. We further support this measure with experiments on English language data, where interdependence is…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Speech and dialogue systems
MethodsSigmoid Activation · Tanh Activation · Long Short-Term Memory
