Aligning Dialogue Agents with Global Feedback via Large Language Model Multimodal Reward Decomposition
Dong Won Lee, Hae Won Park, Cynthia Breazeal, Louis-Philippe Morency

TL;DR
This paper introduces a novel reward decomposition framework using large language models to improve dialogue agent alignment with global feedback, incorporating multimodal cues for enhanced performance.
Contribution
The work presents a new LLM-based reward decomposition method that leverages session-level feedback and multimodal cues for better dialogue agent training, eliminating manual reward shaping.
Findings
Significant improvement in human-evaluated conversation quality.
Multimodal cues enhance reward inference accuracy.
LLMs effectively decompose global feedback into fine-grained rewards.
Abstract
We propose a large language model based reward decomposition framework for aligning dialogue agents using only a single session-level feedback signal. We leverage the reasoning capabilities of a frozen, pretrained large language model (LLM) to infer fine-grained local implicit rewards by decomposing global, session-level feedback. Our first \emph{text-only} variant prompts the LLM to perform reward decomposition using only the dialogue transcript. The second \emph{multimodal} variant incorporates additional behavioral cues, such as pitch, gaze, and facial affect, expressed as natural language descriptions. These inferred turn-level rewards are distilled into a lightweight reward model, which we utilize for RL-based fine-tuning for dialogue generation. We evaluate both text-only and multimodal variants against state-of-the-art reward decomposition methods and demonstrate notable…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsTopic Modeling · Speech and dialogue systems · Multimodal Machine Learning Applications
