Minority-Aware Satisfaction Estimation in Dialogue Systems via Preference-Adaptive Reinforcement Learning
Yahui Fu, Zi Haur Pang, Tatsuya Kawahara

TL;DR
This paper introduces a novel framework for dialogue systems that models individual and group user preferences to improve satisfaction estimation, especially for minority users, using interpretable reasoning and unsupervised clustering within a reinforcement learning setup.
Contribution
It presents a unified approach combining personalized reasoning, unsupervised group discovery, and preference-aware reinforcement learning for better user satisfaction estimation.
Findings
Improved satisfaction estimation accuracy for minority user groups
Effective modeling of individual user preferences through interpretable reasoning
Enhanced dialogue system alignment with diverse user preferences
Abstract
User satisfaction in dialogue systems is inherently subjective. When the same response strategy is applied across users, minority users may assign different satisfaction ratings than majority users due to variations in individual intents and preferences. However, existing alignment methods typically train one-size-fits-all models that aim for broad consensus, often overlooking minority perspectives and user-specific adaptation. We propose a unified framework that models both individual- and group-level preferences for user satisfaction estimation. First, we introduce Chain-of-Personalized-Reasoning (CoPeR) to capture individual preferences through interpretable reasoning chains. Second, we propose an expectation-maximization-based Majority-Minority Preference-Aware Clustering (M2PC) algorithm that discovers distinct user groups in an unsupervised manner to learn group-level preferences.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsSpeech and dialogue systems · Topic Modeling · Emotion and Mood Recognition
