Structural Self-Supervised Objectives for Transformers

Luca Di Liello

arXiv:2309.08272·cs.CL·September 18, 2023

Structural Self-Supervised Objectives for Transformers

Luca Di Liello

PDF

Open Access 1 Repo

TL;DR

This paper introduces novel self-supervised objectives for Transformer pre-training, improving efficiency and downstream task performance, especially with limited labeled data, by aligning pre-training tasks with application structures.

Contribution

It proposes new token swapping-based pre-training objectives and structurally aligned tasks that enhance NLP model performance without modifying Transformer architectures.

Findings

01

RTS and C-RTS reduce pre-training time with comparable performance to MLM.

02

SLM outperforms MLM on certain tasks within the same computational budget.

03

Significant improvements on benchmarks like FEVER, ASNQ, WikiQA, and TREC-QA, especially with limited labeled data.

Abstract

This thesis focuses on improving the pre-training of natural language models using unsupervised raw data to make them more efficient and aligned with downstream applications. In the first part, we introduce three alternative pre-training objectives to BERT's Masked Language Modeling (MLM), namely Random Token Substitution (RTS), Cluster-based Random Token Substitution (C-RTS), and Swapped Language Modeling (SLM). These objectives involve token swapping instead of masking, with RTS and C-RTS aiming to predict token originality and SLM predicting the original token values. Results show that RTS and C-RTS require less pre-training time while maintaining performance comparable to MLM. Surprisingly, SLM outperforms MLM on certain tasks despite using the same computational budget. In the second part, we proposes self-supervised pre-training tasks that align structurally with downstream…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

lucadiliello/transformers-framework
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Text and Document Classification Technologies

MethodsHow do I file a dispute with Expedia?*DisputeFastService · Refunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · BERT · WordPiece · Softmax · Dense Connections · Inverse Square Root Schedule · RoBERTa · Absolute Position Encodings