The Goldilocks Principle: Reading Children's Books with Explicit Memory Representations
Felix Hill, Antoine Bordes, Sumit Chopra, Jason Weston

TL;DR
This paper introduces a new benchmark for evaluating how well language models understand the meaning of children's books by distinguishing semantic content prediction from syntactic function prediction, and shows that explicit long-term memory representations improve semantic understanding.
Contribution
It proposes a novel test differentiating semantic and syntactic prediction tasks, demonstrating that explicit memory models outperform neural models in semantic prediction, and identifies an optimal memory window size for retention.
Findings
Explicit memory models outperform neural models in semantic content prediction.
A balanced memory window size enhances information retention and recall.
Applying the principle to the CNN QA benchmark achieves state-of-the-art results.
Abstract
We introduce a new test of how well language models capture meaning in children's books. Unlike standard language modelling benchmarks, it distinguishes the task of predicting syntactic function words from that of predicting lower-frequency words, which carry greater semantic content. We compare a range of state-of-the-art models, each with a different way of encoding what has been previously read. We show that models which store explicit representations of long-term contexts outperform state-of-the-art neural language models at predicting semantic content words, although this advantage is not observed for syntactic function words. Interestingly, we find that the amount of text encoded in a single memory representation is highly influential to the performance: there is a sweet-spot, not too big and not too small, between single words and full sentences that allows the most meaningful…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMemory Processes and Influences · Reading and Literacy Development · Child and Animal Learning Development
