Updater-Extractor Architecture for Inductive World State Representations

Arseny Moskvichev; James A. Liu

arXiv:2104.05500·cs.CL·April 13, 2021·1 cites

Updater-Extractor Architecture for Inductive World State Representations

Arseny Moskvichev, James A. Liu

PDF

Open Access

TL;DR

This paper introduces a transformer-based Updater-Extractor architecture capable of handling arbitrarily long sequences, improving world state retention and inductive generalization in NLP models, with theoretical and empirical validation.

Contribution

The paper presents a novel transformer architecture and training method that enables models to incorporate new information over long sequences, surpassing traditional context limitations.

Findings

01

Model handles arbitrarily long sequences effectively.

02

Achieves strong inductive generalization.

03

Demonstrates promising results on interpretability tasks.

Abstract

Developing NLP models traditionally involves two stages - training and application. Retention of information acquired after training (at application time) is architecturally limited by the size of the model's context window (in the case of transformers), or by the practical difficulties associated with long sequences (in the case of RNNs). In this paper, we propose a novel transformer-based Updater-Extractor architecture and a training procedure that can work with sequences of arbitrary length and refine its knowledge about the world based on linguistic inputs. We explicitly train the model to incorporate incoming information into its world state representation, obtaining strong inductive generalization and the ability to handle extremely long-range dependencies. We prove a lemma that provides a theoretical basis for our approach. The result also provides insight into success and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Advanced Text Analysis Techniques

MethodsMulti-Head Attention · Linear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Dense Connections · Attention Is All You Need · Softmax · Layer Normalization · Residual Connection · Adam