Learning from Natural Language Feedback for Personalized Question Answering

Alireza Salemi; Hamed Zamani

arXiv:2508.10695·cs.CL·April 27, 2026

Learning from Natural Language Feedback for Personalized Question Answering

Alireza Salemi, Hamed Zamani

PDF

TL;DR

This paper introduces VAC, a framework that uses natural language feedback instead of scalar rewards to improve personalized question answering with large language models, leading to better responses.

Contribution

The paper proposes a novel VAC framework that leverages natural language feedback for training personalized LLMs, enhancing personalization and response quality.

Findings

01

VAC outperforms state-of-the-art on LaMP-QA benchmark

02

Human evaluations favor VAC-generated responses

03

Natural language feedback improves learning efficiency

Abstract

Personalization is crucial for enhancing both the effectiveness and user satisfaction of language technologies, particularly in information-seeking tasks like question answering. Current approaches for personalizing large language models (LLMs) often rely on retrieval-augmented generation (RAG), followed by reinforcement learning with scalar reward signals to teach models how to use retrieved personal context. We believe that these scalar rewards sometimes provide weak, non-instructive feedback, limiting learning efficiency and personalization quality. We introduce VAC, a novel framework for personalized response generation that replaces scalar rewards with natural language feedback (NLF) that are generated conditioned on the user profiles and the question narratives. NLF serves as a rich and actionable supervision signal, allowing the policy model to iteratively refine its outputs and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.