Training LLMs with Reinforcement Learning for Intent-Aware Personalized Question Answering
Maryam Amirizaniani, Benjamin Charles Germain Lee, Jevin West, Nicholas Weber

TL;DR
This paper introduces IAP, a reinforcement learning framework that trains language models to infer and incorporate implicit user intent from single-turn questions, improving personalized question answering performance.
Contribution
The paper presents a novel reinforcement learning approach that explicitly models user intent during training, enhancing single-turn personalized question answering capabilities.
Findings
IAP outperforms all baselines on the LaMP-QA benchmark.
Achieves an average macro-score gain of 7.5% over the strongest competitor.
Effectively infers implicit user intent from minimal input.
Abstract
Effective personalized question answering (PQA) in language models requires grounding responses in the user's underlying intent, where intent refers to the implicit ``why'' behind a query beyond its explicit wording. However, existing approaches to intent-aware personalization rely on multi-turn conversational context or rich user profiles, and do not explicitly model user intent during the reasoning process. This limits their effectiveness in single-turn settings, where the user's latent goal must be inferred from minimal input and integrated into the thinking and reasoning process. To bridge this gap, we propose IAP (Intent-Aware Personalization), a reinforcement learning framework that trains models to infer implicit user intent directly from a single-turn question and incorporate it into thinking steps through a tag-based schema for generating personalized, intent-grounded answers.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
