Mapping the Timescale Organization of Neural Language Models
Hsiang-Yun Sherry Chien, Jinhan Zhang, Christopher. J. Honey

TL;DR
This paper introduces a neuroscience-inspired method to map and analyze the timescale organization of units in neural language models, revealing distinct functional classes and their roles in language processing.
Contribution
It presents a novel, model-free technique for mapping processing timescales in neural networks and uncovers the functional organization of units based on their timescales.
Findings
Long-timescale units track long-range syntactic dependencies.
A small subset of units (less than 15%) have long timescales with unexplored functions.
Distinct roles for 'controller' and 'integrator' units in language processing.
Abstract
In the human brain, sequences of language input are processed within a distributed and hierarchical architecture, in which higher stages of processing encode contextual information over longer timescales. In contrast, in recurrent neural networks which perform natural language processing, we know little about how the multiple timescales of contextual information are functionally organized. Therefore, we applied tools developed in neuroscience to map the "processing timescales" of individual units within a word-level LSTM language model. This timescale-mapping method assigned long timescales to units previously found to track long-range syntactic dependencies. Additionally, the mapping revealed a small subset of the network (less than 15% of units) with long timescales and whose function had not previously been explored. We next probed the functional organization of the network by…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Ferroelectric and Negative Capacitance Devices
MethodsSigmoid Activation · Tanh Activation · Long Short-Term Memory
