How LSTM Encodes Syntax: Exploring Context Vectors and Semi-Quantization on Natural Text
Chihiro Shibata, Kei Uchiumi, Daichi Mochihashi

TL;DR
This paper investigates how LSTM encodes syntactic information in natural language, revealing that internal vectors are quantized and correlated with phrase structure depth, aiding in understanding LSTM's syntactic representations.
Contribution
It demonstrates that LSTM context vectors encode syntactic depth and phrase structure information, with quantization and correlation analyses providing new insights into internal representations.
Findings
Context vectors are approximately binary or ternary, aiding nesting depth counting.
Activations correlate with phrase structure depth such as VP and NP.
Small components of context vectors can predict phrase membership with regularization.
Abstract
Long Short-Term Memory recurrent neural network (LSTM) is widely used and known to capture informative long-term syntactic dependencies. However, how such information are reflected in its internal vectors for natural text has not yet been sufficiently investigated. We analyze them by learning a language model where syntactic structures are implicitly given. We empirically show that the context update vectors, i.e. outputs of internal gates, are approximately quantized to binary or ternary values to help the language model to count the depth of nesting accurately, as Suzgun et al. (2019) recently show for synthetic Dyck languages. For some dimensions in the context vector, we show that their activations are highly correlated with the depth of phrase structures, such as VP and NP. Moreover, with an regularization, we also found that it can accurately predict whether a word is inside…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Speech Recognition and Synthesis
MethodsSigmoid Activation · Tanh Activation · Long Short-Term Memory
