Structural Self-Supervised Objectives for Transformers
Luca Di Liello

TL;DR
This paper introduces novel self-supervised objectives for Transformer pre-training, improving efficiency and downstream task performance, especially with limited labeled data, by aligning pre-training tasks with application structures.
Contribution
It proposes new token swapping-based pre-training objectives and structurally aligned tasks that enhance NLP model performance without modifying Transformer architectures.
Findings
RTS and C-RTS reduce pre-training time with comparable performance to MLM.
SLM outperforms MLM on certain tasks within the same computational budget.
Significant improvements on benchmarks like FEVER, ASNQ, WikiQA, and TREC-QA, especially with limited labeled data.
Abstract
This thesis focuses on improving the pre-training of natural language models using unsupervised raw data to make them more efficient and aligned with downstream applications. In the first part, we introduce three alternative pre-training objectives to BERT's Masked Language Modeling (MLM), namely Random Token Substitution (RTS), Cluster-based Random Token Substitution (C-RTS), and Swapped Language Modeling (SLM). These objectives involve token swapping instead of masking, with RTS and C-RTS aiming to predict token originality and SLM predicting the original token values. Results show that RTS and C-RTS require less pre-training time while maintaining performance comparable to MLM. Surprisingly, SLM outperforms MLM on certain tasks despite using the same computational budget. In the second part, we proposes self-supervised pre-training tasks that align structurally with downstream…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Text and Document Classification Technologies
MethodsHow do I file a dispute with Expedia?*DisputeFastService · Refunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · BERT · WordPiece · Softmax · Dense Connections · Inverse Square Root Schedule · RoBERTa · Absolute Position Encodings
