TL;DR
This paper investigates whether neural language models like BART and T5 encode dynamic, entity-based representations of meaning that support reasoning about the world, beyond surface-level word statistics.
Contribution
It demonstrates that pretrained models develop implicit, manipulable representations of entities and situations, akin to dynamic semantics, learned solely from text data.
Findings
Neural representations support property and relation readouts for entities.
Manipulating these representations affects language generation predictably.
Models encode dynamic, entity-based meaning representations.
Abstract
Does the effectiveness of neural language models derive entirely from accurate modeling of surface word co-occurrence statistics, or do these models represent and reason about the world they describe? In BART and T5 transformer language models, we identify contextual word representations that function as models of entities and situations as they evolve throughout a discourse. These neural representations have functional similarities to linguistic models of dynamic semantics: they support a linear readout of each entity's current properties and relations, and can be manipulated with predictable effects on language generation. Our results indicate that prediction in pretrained neural language models is supported, at least in part, by dynamic representations of meaning and implicit simulation of entity state, and that this behavior can be learned with only text as training data. Code and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsGated Linear Unit · Refunds@Expedia|||How do I get a full refund from Expedia? · Multi-Head Attention · Linear Layer · Dropout · Byte Pair Encoding · Attention Is All You Need · Adam · Inverse Square Root Schedule · Layer Normalization
