SentiAvatar: Towards Expressive and Interactive Digital Humans

Chuhao Jin; Rui Zhang; Qingzhe Gao; Haoyu Shi; Dayu Wu; Yichen Jiang; Yihan Wu; Ruihua Song

arXiv:2604.02908·cs.CV·April 21, 2026

SentiAvatar: Towards Expressive and Interactive Digital Humans

Chuhao Jin, Rui Zhang, Qingzhe Gao, Haoyu Shi, Dayu Wu, Yichen Jiang, Yihan Wu, Ruihua Song

PDF

1 Repo 1 Datasets

TL;DR

SentiAvatar is a framework for creating expressive, interactive 3D digital humans that synchronize speech, gestures, and emotions in real time, leveraging large-scale multimodal data and a novel motion generation architecture.

Contribution

It introduces a new multimodal dialogue dataset, a pre-trained motion foundation model, and an audio-aware motion generation architecture for realistic digital humans.

Findings

01

Achieved state-of-the-art results on SuSuInterActs and BEATv2 datasets.

02

Generated 6 seconds of motion in 0.3 seconds with multi-turn streaming.

03

Produced highly synchronized speech, gestures, and expressions in real time.

Abstract

We present SentiAvatar, a framework for building expressive interactive 3D digital humans, and use it to create SuSu, a virtual character that speaks, gestures, and emotes in real time. Achieving such a system remains challenging, as it requires jointly addressing three key problems: the lack of large-scale, high-quality multimodal data, robust semantic-to-motion mapping, and fine-grained frame-level motion-prosody synchronization. To solve these problems, first, we build SuSuInterActs (21K clips, 37 hours), a dialogue corpus captured via optical motion capture around a single character with synchronized speech, full-body motion, and facial expressions. Second, we pre-train a Motion Foundation Model on 200K+ motion sequences, equipping it with rich action priors that go well beyond the conversation. We then propose an audio-aware plan-then-infill architecture that decouples…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

https://sentiavatar.github.io
github

Datasets

Chuhaojin/SuSuInterActs
dataset· 592 dl
592 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.