Learning from Natural Language Feedback for Personalized Question Answering
Alireza Salemi, Hamed Zamani

TL;DR
This paper introduces VAC, a framework that uses natural language feedback instead of scalar rewards to improve personalized question answering with large language models, leading to better responses.
Contribution
The paper proposes a novel VAC framework that leverages natural language feedback for training personalized LLMs, enhancing personalization and response quality.
Findings
VAC outperforms state-of-the-art on LaMP-QA benchmark
Human evaluations favor VAC-generated responses
Natural language feedback improves learning efficiency
Abstract
Personalization is crucial for enhancing both the effectiveness and user satisfaction of language technologies, particularly in information-seeking tasks like question answering. Current approaches for personalizing large language models (LLMs) often rely on retrieval-augmented generation (RAG), followed by reinforcement learning with scalar reward signals to teach models how to use retrieved personal context. We believe that these scalar rewards sometimes provide weak, non-instructive feedback, limiting learning efficiency and personalization quality. We introduce VAC, a novel framework for personalized response generation that replaces scalar rewards with natural language feedback (NLF) that are generated conditioned on the user profiles and the question narratives. NLF serves as a rich and actionable supervision signal, allowing the policy model to iteratively refine its outputs and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
