Do Neural Nets Learn Statistical Laws behind Natural Language?
Shuntaro Takahashi, Kumiko Tanaka-Ishii

TL;DR
This paper empirically shows that LSTM neural language models can reproduce key statistical laws of natural language, like Zipf's and Heaps' laws, but struggle with long-range correlations, informing future architecture improvements.
Contribution
It provides empirical evidence that neural language models can learn statistical laws of language and highlights their limitations in capturing long-range correlations.
Findings
LSTM models reproduce Zipf's law effectively.
LSTM models reproduce Heaps' law effectively.
Models struggle with long-range correlation reproduction.
Abstract
The performance of deep learning in natural language processing has been spectacular, but the reasons for this success remain unclear because of the inherent complexity of deep learning. This paper provides empirical evidence of its effectiveness and of a limitation of neural networks for language engineering. Precisely, we demonstrate that a neural language model based on long short-term memory (LSTM) effectively reproduces Zipf's law and Heaps' law, two representative statistical properties underlying natural language. We discuss the quality of reproducibility and the emergence of Zipf's law and Heaps' law as training progresses. We also point out that the neural language model has a limitation in reproducing long-range correlation, another statistical property of natural language. This understanding could provide a direction for improving the architectures of neural networks.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
