Minority-Aware Satisfaction Estimation in Dialogue Systems via Preference-Adaptive Reinforcement Learning

Yahui Fu; Zi Haur Pang; Tatsuya Kawahara

arXiv:2511.05407·cs.CL·November 10, 2025

Minority-Aware Satisfaction Estimation in Dialogue Systems via Preference-Adaptive Reinforcement Learning

Yahui Fu, Zi Haur Pang, Tatsuya Kawahara

PDF

Open Access 1 Video

TL;DR

This paper introduces a novel framework for dialogue systems that models individual and group user preferences to improve satisfaction estimation, especially for minority users, using interpretable reasoning and unsupervised clustering within a reinforcement learning setup.

Contribution

It presents a unified approach combining personalized reasoning, unsupervised group discovery, and preference-aware reinforcement learning for better user satisfaction estimation.

Findings

01

Improved satisfaction estimation accuracy for minority user groups

02

Effective modeling of individual user preferences through interpretable reasoning

03

Enhanced dialogue system alignment with diverse user preferences

Abstract

User satisfaction in dialogue systems is inherently subjective. When the same response strategy is applied across users, minority users may assign different satisfaction ratings than majority users due to variations in individual intents and preferences. However, existing alignment methods typically train one-size-fits-all models that aim for broad consensus, often overlooking minority perspectives and user-specific adaptation. We propose a unified framework that models both individual- and group-level preferences for user satisfaction estimation. First, we introduce Chain-of-Personalized-Reasoning (CoPeR) to capture individual preferences through interpretable reasoning chains. Second, we propose an expectation-maximization-based Majority-Minority Preference-Aware Clustering (M2PC) algorithm that discovers distinct user groups in an unsupervised manner to learn group-level preferences.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Minority-Aware Satisfaction Estimation in Dialogue Systems via Preference-Adaptive Reinforcement Learning· underline

Taxonomy

TopicsSpeech and dialogue systems · Topic Modeling · Emotion and Mood Recognition