GDPO-Listener: Expressive Interactive Head Generation via Auto-Regressive Flow Matching and Group reward-Decoupled Policy Optimization

Zhangyu Jin; Maksim Siniukov; Deuksin Kwon; Ashutosh Chaubey; Mohammad Soleymani

arXiv:2603.25020·cs.CV·March 27, 2026

GDPO-Listener: Expressive Interactive Head Generation via Auto-Regressive Flow Matching and Group reward-Decoupled Policy Optimization

Zhangyu Jin, Maksim Siniukov, Deuksin Kwon, Ashutosh Chaubey, Mohammad Soleymani

PDF

Open Access

TL;DR

GDPO-Listener is a new framework for generating highly expressive 3D head motions in virtual interactions, overcoming previous limitations in variability and expressivity through innovative architectures and optimization techniques.

Contribution

It introduces an Auto-Regressive Flow Matching architecture and a Group reward-Decoupled Policy Optimization method for expressive, controllable head motion generation.

Findings

01

Outperforms baselines in kinematic variance and expressivity

02

Achieves semantic controllability in head motion

03

Demonstrates stability and high quality in long-term interactions

Abstract

Generating realistic 3D head motion for dyadic interactions is a significant challenge in virtual human synthesis. While recent methods achieve impressive results with speaking heads, they frequently suffer from the `Regression-to-the-Mean' problem in listener motions, collapsing into static faces, and lack the parameter space for complex nonverbal motions. In this paper, we propose GDPO-Listener, a novel framework that achieves highly expressive speaking and listening motion generation. First, we introduce an Auto-Regressive Flow Matching architecture enabling stable supervised learning. Second, to overcome kinematic stillness, we apply the Group reward-Decoupled Policy Optimization (GDPO). By isolating reward normalization across distinct FLAME parameter groups, GDPO explicitly incentivizes high variance expressive generations. Finally, we enable explicit semantic text control for…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Motion and Animation · Face recognition and analysis · Generative Adversarial Networks and Image Synthesis