YES SIR!Optimizing Semantic Space of Negatives with Self-Involvement Ranker
Ruizhi Pu, Xinyu Zhang, Ruofei Lai, Zikai Guo, Yinxia Zhang, Hao, Jiang, Yongkang Wu, Yantao Jia, Zhicheng Dou, Zhao Cao

TL;DR
This paper introduces Self-Involvement Ranker (SIR), a novel fine-tuning strategy for document ranking that dynamically selects hard negative samples to improve semantic space quality and ranking performance of pre-trained models.
Contribution
The paper proposes SIR, a lightweight, general framework that adaptively selects hard negatives using supervisory signals, achieving state-of-the-art results on MS MARCO document ranking.
Findings
SIR significantly improves ranking performance across models.
SIR sets new SOTA on MS MARCO leaderboard.
Dynamic negative sampling enhances semantic space quality.
Abstract
Pre-trained model such as BERT has been proved to be an effective tool for dealing with Information Retrieval (IR) problems. Due to its inspiring performance, it has been widely used to tackle with real-world IR problems such as document ranking. Recently, researchers have found that selecting "hard" rather than "random" negative samples would be beneficial for fine-tuning pre-trained models on ranking tasks. However, it remains elusive how to leverage hard negative samples in a principled way. To address the aforementioned issues, we propose a fine-tuning strategy for document ranking, namely Self-Involvement Ranker (SIR), to dynamically select hard negative samples to construct high-quality semantic space for training a high-quality ranking model. Specifically, SIR consists of sequential compressors implemented with pre-trained models. Front compressor selects hard negative samples…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Advanced Text Analysis Techniques
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Linear Warmup With Linear Decay · Weight Decay · Attention Dropout · Dropout · Layer Normalization · Softmax · Residual Connection
