References Indeed Matter? Reference-Free Preference Optimization for Conversational Query Reformulation

Doyoung Kim; Youngjun Lee; Joeun Kim; Jihwan Bang; Hwanjun Song; Susik Yoon; Jae-Gil Lee

arXiv:2505.06552·cs.CL·October 14, 2025

References Indeed Matter? Reference-Free Preference Optimization for Conversational Query Reformulation

Doyoung Kim, Youngjun Lee, Joeun Kim, Jihwan Bang, Hwanjun Song, Susik Yoon, Jae-Gil Lee

PDF

Open Access 3 Reviews

TL;DR

This paper introduces DualReform, a reference-free framework for conversational query reformulation that generates pseudo references from dialogue data, achieving near-reference-level retrieval accuracy without needing actual reference passages.

Contribution

DualReform is a novel approach that infers pseudo reference passages from conversational data using response-based inference and response refinement, eliminating the need for reference passages during training.

Findings

01

Achieves 96.9-99.1% of reference-based retrieval accuracy.

02

Surpasses state-of-the-art by up to 31.6%.

03

Operates effectively without relying on reference passages.

Abstract

Conversational query reformulation (CQR) has become indispensable for improving retrieval in dialogue-based applications. However, existing approaches typically rely on reference passages for optimization, which are impractical to acquire in real-world scenarios. To address this limitation, we introduce a novel reference-free preference optimization framework DualReform that generates pseudo reference passages from commonly-encountered conversational datasets containing only queries and responses. DualReform attains this goal through two key innovations: (1) response-based inference, where responses serve as proxies to infer pseudo reference passages, and (2) response refinement via the dual-role of CQR, where a CQR model refines responses based on the shared objectives between response refinement and CQR. Despite not relying on reference passages, DualReform achieves 96.9--99.1% of the…

Peer Reviews

Decision·Submitted to ICLR 2026

Reviewer 01Rating 2Confidence 5

Strengths

1. By iterating between pseudo passages retrieval and preference optimization through a self-reinforcement loop, both can promote each other, leading to continuous optimization of the CQR model. 2. Similar to query reformulation, the response is refined to clarify ambiguities and omissions, leading to more accurate retrieval of pseudo passages. 3. Experiments demonstrate that DUALREFORM achieves retrieval accuracy very close to that of systems using true references.

Weaknesses

1. The motivation and novelty of this research is not convincing. There is no necessity for inferring pseudo passages just for training, especially limited to the conversational QR scenarios. The community already has many datasets with relevance judgments for such a purpose. Besides, the definition of reference is unclear here. If this denotes relevance judgments, then previous CQR methods do not necessarily use relevance judgments, which means the claim in the second paragraph in the Introduc

Reviewer 02Rating 6Confidence 4

Strengths

1. Empirical gains are substantial and consistent across datasets and retrievers, often nearing the upper-bound. 2. The iterative refinement and query-forming template are well studied with ablations and extended metrics, showing clear contributions from each design choice.

Weaknesses

1. Component-level attribution could be better: the paper reports useful ablations but does not present a unified per-module breakdown (retriever type, pseudo refs, etc.,) to show which component drives most of the gains. 2. Iterative pseudo-refinement and extra retrieval steps likely increase per-query compute. Practical cost and latency are necessary but unreported. 3. The training pipeline relies on reformulated queries generated by ChatGPT. Can it be achieved using reformulations produced by

Reviewer 03Rating 4Confidence 3

Strengths

1. The major part of this work is easy to follow. The authors have given a nice preliminary study. 2. This work may be the first to achieve a reference-free preference optimization framework for CQR. 3. This work has provided a lot of theoretical theories to prove dual-role CQR reduces training error vs. single-role (LLM-only) refinement. 4. Achieves strong empirical performance: 1) 90%+ of reference-based accuracy, 2) 15.7% average improvement over SOTA reference-free baseline and 3) s

Weaknesses

1. Some statements are not very clear in this paper. I know that most of them may have been well validated in prior works, but the reader of paper may not familiar with them. For example, it is hard to infer Lemmas 4.2 and 4.3 solely from the Assumption. Note that reading the appendix is not compulsory. Thus, the authors should give at least an easy but intuitive brief introduction in the main paper. 2. I do not directly focus on this research field, so I have a concern about the definition

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Speech and dialogue systems · Multimodal Machine Learning Applications