Transient Chaos in BERT
Katsuma Inoue, Soh Ohara, Yasuo Kuniyoshi, and Kohei Nakajima

TL;DR
This paper explores the dynamical properties of ALBERT, revealing that it exhibits transient chaos which may enhance its NLP capabilities by increasing expressive power.
Contribution
It is the first to analyze ALBERT's dynamics, demonstrating that pre-training induces transient chaos that potentially improves NLP task performance.
Findings
Pre-trained ALBERT shows higher-dimensional stable trajectories.
ALBERT exhibits long-lasting transient chaos compared to random initialization.
Chaotic dynamics may play a role in effective language processing.
Abstract
Language is an outcome of our complex and dynamic human-interactions and the technique of natural language processing (NLP) is hence built on human linguistic activities. Bidirectional Encoder Representations from Transformers (BERT) has recently gained its popularity by establishing the state-of-the-art scores in several NLP benchmarks. A Lite BERT (ALBERT) is literally characterized as a lightweight version of BERT, in which the number of BERT parameters is reduced by repeatedly applying the same neural network called Transformer's encoder layer. By pre-training the parameters with a massive amount of natural language data, ALBERT can convert input sentences into versatile high-dimensional vectors potentially capable of solving multiple NLP tasks. In that sense, ALBERT can be regarded as a well-designed high-dimensional dynamical system whose operator is the Transformer's encoder, and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Neural Networks and Reservoir Computing · Neural Networks and Applications
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Multi-Head Attention · Attention Is All You Need · Linear Layer · Adam · LAMB · Linear Warmup With Linear Decay · Residual Connection · WordPiece · ALBERT
