From Clicks to Preference: A Multi-stage Alignment Framework for Generative Query Suggestion in Conversational System

Junhao Yin; Haolin Wang; Peng Bao; Ju Xu; Yongliang Wang

arXiv:2508.15811·cs.CL·December 16, 2025

From Clicks to Preference: A Multi-stage Alignment Framework for Generative Query Suggestion in Conversational System

Junhao Yin, Haolin Wang, Peng Bao, Ju Xu, Yongliang Wang

PDF

TL;DR

This paper presents a multi-stage framework for generative query suggestion in conversational systems, aligning model outputs with user preferences through prompt engineering, fine-tuning, probabilistic preference modeling, and reinforcement learning, resulting in improved user engagement.

Contribution

The paper introduces a novel multi-stage alignment framework incorporating Gaussian Reward Models and out-of-distribution regularization for better preference modeling and policy alignment.

Findings

01

Significant improvement over baselines in automatic and human evaluations.

02

34% relative increase in user engagement in live A/B tests.

03

Effective modeling of user preferences as probability distributions.

Abstract

Generative query suggestion using large language models offers a powerful way to enhance conversational systems, but aligning outputs with nuanced user preferences remains a critical challenge. To address this, we introduce a multi-stage framework designed for progressive alignment between the generation policy and user intent. Our pipeline begins with prompt engineering as a cold-start strategy, followed by the Supervised Fine-Tuning stage, in which we introduce a distillation method on click logs to create a robust foundational model. To better model user preferences while capturing their inherent uncertainty, we develop a Gaussian Reward Model (GaRM) that represents user preferences as probability distributions rather than point estimates. Finally, we employ reinforcement learning to align the generation policy with these preferences, guided by a composite reward function that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.