From Clicks to Preference: A Multi-stage Alignment Framework for Generative Query Suggestion in Conversational System
Junhao Yin, Haolin Wang, Peng Bao, Ju Xu, Yongliang Wang

TL;DR
This paper presents a multi-stage framework for generative query suggestion in conversational systems, aligning model outputs with user preferences through prompt engineering, fine-tuning, probabilistic preference modeling, and reinforcement learning, resulting in improved user engagement.
Contribution
The paper introduces a novel multi-stage alignment framework incorporating Gaussian Reward Models and out-of-distribution regularization for better preference modeling and policy alignment.
Findings
Significant improvement over baselines in automatic and human evaluations.
34% relative increase in user engagement in live A/B tests.
Effective modeling of user preferences as probability distributions.
Abstract
Generative query suggestion using large language models offers a powerful way to enhance conversational systems, but aligning outputs with nuanced user preferences remains a critical challenge. To address this, we introduce a multi-stage framework designed for progressive alignment between the generation policy and user intent. Our pipeline begins with prompt engineering as a cold-start strategy, followed by the Supervised Fine-Tuning stage, in which we introduce a distillation method on click logs to create a robust foundational model. To better model user preferences while capturing their inherent uncertainty, we develop a Gaussian Reward Model (GaRM) that represents user preferences as probability distributions rather than point estimates. Finally, we employ reinforcement learning to align the generation policy with these preferences, guided by a composite reward function that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
