Challenging Decoder helps in Masked Auto-Encoder Pre-training for Dense Passage Retrieval
Zehan Li, Yanzhao Zhang, Dingkun Long, Pengjun Xie

TL;DR
This paper introduces a challenging decoder strategy in masked auto-encoder pre-training that enhances dense passage retrieval by improving encoder representations without extra costs.
Contribution
It proposes a novel token importance aware masking method based on mutual information to make the decoder more demanding, strengthening encoder learning in an unsupervised way.
Findings
Improves dense passage retrieval performance on large-scale datasets.
Enhances zero-shot retrieval robustness.
No additional pre-training expenses incurred.
Abstract
Recently, various studies have been directed towards exploring dense passage retrieval techniques employing pre-trained language models, among which the masked auto-encoder (MAE) pre-training architecture has emerged as the most promising. The conventional MAE framework relies on leveraging the passage reconstruction of decoder to bolster the text representation ability of encoder, thereby enhancing the performance of resulting dense retrieval systems. Within the context of building the representation ability of the encoder through passage reconstruction of decoder, it is reasonable to postulate that a ``more demanding'' decoder will necessitate a corresponding increase in the encoder's ability. To this end, we propose a novel token importance aware masking strategy based on pointwise mutual information to intensify the challenge of the decoder. Importantly, our approach can be…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications
MethodsMasked autoencoder · Attentive Walk-Aggregating Graph Neural Network
