Information retrieval for label noise document ranking by bag sampling and group-wise loss
Chunyu Li, Jiajia Ding, Xing hu, Fan Wang

TL;DR
This paper introduces a novel bag sampling and group-wise Localized Contrastive Estimation method to improve long document ranking by reducing label noise and balancing negative sampling, achieving state-of-the-art results.
Contribution
The paper proposes a new bag sampling technique combined with group-wise LCE loss to enhance long document retrieval performance and robustness against label noise.
Findings
Achieved excellent performance on MS MARCO Long document ranking leaderboard.
Effectively reduces the impact of label noise in long document retrieval.
Balances negative sampling for improved ranking accuracy.
Abstract
Long Document retrieval (DR) has always been a tremendous challenge for reading comprehension and information retrieval. The pre-training model has achieved good results in the retrieval stage and Ranking for long documents in recent years. However, there is still some crucial problem in long document ranking, such as data label noises, long document representations, negative data Unbalanced sampling, etc. To eliminate the noise of labeled data and to be able to sample the long documents in the search reasonably negatively, we propose the bag sampling method and the group-wise Localized Contrastive Estimation(LCE) method. We use the head middle tail passage for the long document to encode the long document, and in the retrieval, stage Use dense retrieval to generate the candidate's data. The retrieval data is divided into multiple bags at the ranking stage, and negative samples are…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsInformation Retrieval and Search Behavior · Text and Document Classification Technologies · Image Retrieval and Classification Techniques
