Strengthening Structural Inductive Biases by Pre-training to Perform   Syntactic Transformations

Matthias Lindemann; Alexander Koller; Ivan Titov

arXiv:2407.04543·cs.CL·July 8, 2024

Strengthening Structural Inductive Biases by Pre-training to Perform Syntactic Transformations

Matthias Lindemann, Alexander Koller, Ivan Titov

PDF

Open Access 1 Repo 1 Models 1 Video

TL;DR

This paper proposes a pre-training method that enhances Transformers' structural inductive biases by training on syntactic transformations, improving few-shot learning and generalization in syntactic and semantic tasks.

Contribution

Introducing intermediate pre-training on syntactic transformations to strengthen Transformers' structural biases for better performance on syntactic and semantic tasks.

Findings

01

Pre-training improves few-shot syntactic task performance.

02

Enhanced structural generalization in semantic parsing.

03

Attention heads track syntactic transformations effectively.

Abstract

Models need appropriate inductive biases to effectively learn from small amounts of data and generalize systematically outside of the training distribution. While Transformers are highly versatile and powerful, they can still benefit from enhanced structural inductive biases for seq2seq tasks, especially those involving syntactic transformations, such as converting active to passive voice or semantic parsing. In this paper, we propose to strengthen the structural inductive bias of a Transformer by intermediate pre-training to perform synthetically generated syntactic transformations of dependency trees given a description of the transformation. Our experiments confirm that this helps with few-shot learning of syntactic tasks such as chunking, and also improves structural generalization for semantic parsing. Our analysis shows that the intermediate pre-training leads to attention heads…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

namednil/step
pytorchOfficial

Models

🤗
namednil/STEP
model· 4 dl
4 dl

Videos

Strengthening Structural Inductive Biases by Pre-training to Perform Syntactic Transformations· underline

Taxonomy

TopicsNeurobiology of Language and Bilingualism · Intelligent Tutoring Systems and Adaptive Learning · Neural Networks and Applications

MethodsAttention Is All You Need · Sigmoid Activation · Linear Layer · Tanh Activation · Multi-Head Attention · Long Short-Term Memory · Softmax · Byte Pair Encoding · Layer Normalization · Label Smoothing