SIP: Injecting a Structural Inductive Bias into a Seq2Seq Model by   Simulation

Matthias Lindemann; Alexander Koller; Ivan Titov

arXiv:2310.00796·cs.CL·July 11, 2024

SIP: Injecting a Structural Inductive Bias into a Seq2Seq Model by Simulation

Matthias Lindemann, Alexander Koller, Ivan Titov

PDF

Open Access 1 Repo 1 Models

TL;DR

This paper introduces a method to enhance seq2seq models, especially Transformers, by pre-training them to simulate finite state transducers, thereby improving their systematic generalization and few-shot learning capabilities.

Contribution

The authors propose a novel pre-training approach that injects a structural inductive bias into Transformers by simulation of FSTs, improving generalization on structured tasks.

Findings

01

Enhanced systematic generalization to FST-like tasks.

02

Improved few-shot learning performance.

03

Models internalize FST state dynamics.

Abstract

Strong inductive biases enable learning from little data and help generalization outside of the training distribution. Popular neural architectures such as Transformers lack strong structural inductive biases for seq2seq NLP tasks on their own. Consequently, they struggle with systematic generalization beyond the training distribution, e.g. with extrapolating to longer inputs, even when pre-trained on large amounts of text. We show how a structural inductive bias can be efficiently injected into a seq2seq model by pre-training it to simulate structural transformations on synthetic data. Specifically, we inject an inductive bias towards Finite State Transducers (FSTs) into a Transformer by pre-training it to simulate FSTs given their descriptions. Our experiments show that our method imparts the desired inductive bias, resulting in improved systematic generalization and better few-shot…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

namednil/sip
pytorchOfficial

Models

🤗
namednil/sip-fst-tokenizer
model

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Computational Physics and Python Applications

MethodsMulti-Head Attention · Attention Is All You Need · Sigmoid Activation · Tanh Activation · Long Short-Term Memory · Dense Connections · Linear Layer · Label Smoothing · Absolute Position Encodings · Adam