DiffListener: Discrete Diffusion Model for Listener Generation

Siyeol Jung; Taehwan Kim

arXiv:2502.06822·cs.LG·February 12, 2025

DiffListener: Discrete Diffusion Model for Listener Generation

Siyeol Jung, Taehwan Kim

PDF

Open Access

TL;DR

DiffListener introduces a discrete diffusion model that generates natural, synchronized listener responses from multimodal speaker cues, improving over autoregressive methods by explicitly modeling facial dynamics.

Contribution

It presents a novel non-autoregressive diffusion-based approach incorporating facial differential information for listener head generation.

Findings

01

Achieves state-of-the-art quantitative performance.

02

Produces highly natural and context-aware reactions.

03

User study confirms high naturalness and synchronization.

Abstract

The listener head generation (LHG) task aims to generate natural nonverbal listener responses based on the speaker's multimodal cues. While prior work either rely on limited modalities (e.g. audio and facial information) or employ autoregressive approaches which have limitations such as accumulating prediction errors. To address these limitations, we propose DiffListener, a discrete diffusion based approach for non-autoregressive listener head generation. Our model takes the speaker's facial information, audio, and text as inputs, additionally incorporating facial differential information to represent the temporal dynamics of expressions and movements. With this explicit modeling of facial dynamics, DiffListener can generate coherent reaction sequences in a non-autoregressive manner. Through comprehensive experiments, DiffListener demonstrates state-of-the-art performance in both…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing

MethodsDiffusion