Masked Autoencoders As The Unified Learners For Pre-Trained Sentence Representation
Alexander Liu, Samuel Yang

TL;DR
This paper introduces RetroMAE, a unified pre-training framework for sentence representations that supports various tasks through a two-stage process involving generic and domain-specific data, enhancing performance across multiple benchmarks.
Contribution
It extends MAE-style pre-training to a two-stage process with RetroMAE, enabling effective universal and domain-specific sentence representations in a single framework.
Findings
Effective for zero-shot retrieval on BEIR benchmark
Improves downstream tasks like dense retrieval on MS MARCO and NLI
Enhances sentence embedding quality for STS and transfer tasks
Abstract
Despite the progresses on pre-trained language models, there is a lack of unified frameworks for pre-trained sentence representation. As such, it calls for different pre-training methods for specific scenarios, and the pre-trained models are likely to be limited by their universality and representation quality. In this work, we extend the recently proposed MAE style pre-training strategy, RetroMAE, such that it may effectively support a wide variety of sentence representation tasks. The extended framework consists of two stages, with RetroMAE conducted throughout the process. The first stage performs RetroMAE over generic corpora, like Wikipedia, BookCorpus, etc., from which the base model is learned. The second stage takes place on domain-specific data, e.g., MS MARCO and NLI, where the base model is continuingly trained based on RetroMAE and contrastive learning. The pre-training…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Text Readability and Simplification
MethodsMasked autoencoder · Balanced Selection
