ARHN: Answer-Centric Relabeling of Hard Negatives with Open-Source LLMs for Dense Retrieval

Hyewon Choi; Jooyoung Choi; Hansol Jang; Hyun Kim; Chulmin Yun; ChangWook Jun; and Stanley Jungkyu Choi

arXiv:2604.11092·cs.IR·April 14, 2026

ARHN: Answer-Centric Relabeling of Hard Negatives with Open-Source LLMs for Dense Retrieval

Hyewon Choi, Jooyoung Choi, Hansol Jang, Hyun Kim, Chulmin Yun, ChangWook Jun, and Stanley Jungkyu Choi

PDF

TL;DR

ARHN is a two-stage framework that uses open-source LLMs to refine hard negatives in neural retriever training, improving data quality and retrieval effectiveness.

Contribution

It introduces a novel answer-centric relabeling and filtering method leveraging open-source LLMs to enhance training data for dense retrieval models.

Findings

01

Combined relabeling and filtering improve retrieval performance across datasets.

02

ARHN reduces false negatives and ambiguous negatives in training data.

03

The method is cost-effective and scalable using open-source models.

Abstract

Neural retrievers are often trained on large-scale triplet data comprising a query, a positive passage, and a set of hard negatives. In practice, hard-negative mining can introduce false negatives and other ambiguous negatives, including passages that are relevant or contain partial answers to the query. Such label noise yields inconsistent supervision and can degrade retrieval effectiveness. We propose ARHN (Answer-centric Relabeling of Hard Negatives), a two-stage framework that leverages open-source LLMs to refine hard negative samples using answer-centric relevance signals. In the first stage, for each query-passage pair, ARHN prompts the LLM to generate a passage-grounded answer snippet or to indicate that the passage does not support an answer. In the second stage, ARHN applies an LLM-based listwise ranking over the candidate set to order passages by direct answerability to the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.