Updater-Extractor Architecture for Inductive World State Representations
Arseny Moskvichev, James A. Liu

TL;DR
This paper introduces a transformer-based Updater-Extractor architecture capable of handling arbitrarily long sequences, improving world state retention and inductive generalization in NLP models, with theoretical and empirical validation.
Contribution
The paper presents a novel transformer architecture and training method that enables models to incorporate new information over long sequences, surpassing traditional context limitations.
Findings
Model handles arbitrarily long sequences effectively.
Achieves strong inductive generalization.
Demonstrates promising results on interpretability tasks.
Abstract
Developing NLP models traditionally involves two stages - training and application. Retention of information acquired after training (at application time) is architecturally limited by the size of the model's context window (in the case of transformers), or by the practical difficulties associated with long sequences (in the case of RNNs). In this paper, we propose a novel transformer-based Updater-Extractor architecture and a training procedure that can work with sequences of arbitrary length and refine its knowledge about the world based on linguistic inputs. We explicitly train the model to incorporate incoming information into its world state representation, obtaining strong inductive generalization and the ability to handle extremely long-range dependencies. We prove a lemma that provides a theoretical basis for our approach. The result also provides insight into success and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Advanced Text Analysis Techniques
MethodsMulti-Head Attention · Linear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Dense Connections · Attention Is All You Need · Softmax · Layer Normalization · Residual Connection · Adam
