Effective Pre-Training Objectives for Transformer-based Autoencoders
Luca Di Liello, Matteo Gabburo, Alessandro Moschitti

TL;DR
This paper explores efficient pre-training objectives for Transformer encoders, proposing lighter alternatives to existing methods that reduce computational cost while maintaining performance.
Contribution
It introduces new pre-training approaches combining features of common objectives and designs lightweight token generators to replace heavy ones like ELECTRA.
Findings
Light token generators significantly reduce pre-training cost.
Alternative objectives outperform BERT's MLM in efficiency.
Light pre-training approaches maintain competitive accuracy.
Abstract
In this paper, we study trade-offs between efficiency, cost and accuracy when pre-training Transformer encoders with different pre-training objectives. For this purpose, we analyze features of common objectives and combine them to create new effective pre-training approaches. Specifically, we designed light token generators based on a straightforward statistical approach, which can replace ELECTRA computationally heavy generators, thus highly reducing cost. Our experiments also show that (i) there are more efficient alternatives to BERT's MLM, and (ii) it is possible to efficiently pre-train Transformer-based models using lighter generators without a significant drop in performance.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications · Music and Audio Processing · Anomaly Detection Techniques and Applications
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Weight Decay · Adam · Dense Connections · Softmax · Position-Wise Feed-Forward Layer · WordPiece · Linear Warmup With Linear Decay
