Towards Interactive Intelligence for Digital Humans

Yiyi Cai; Xuangeng Chu; Xiwei Gao; Sitong Gong; Yifei Huang; Caixin Kang; Kunhang Li; Haiyang Liu; Ruicong Liu; Yun Liu; Dianwen Ng; Zixiong Su; Erwin Wu; Yuhan Wu; Dingkun Yan; Tianyu Yan; Chang Zeng; Bo Zheng; You Zhou

arXiv:2512.13674·cs.CV·March 16, 2026

Towards Interactive Intelligence for Digital Humans

Yiyi Cai, Xuangeng Chu, Xiwei Gao, Sitong Gong, Yifei Huang, Caixin Kang, Kunhang Li, Haiyang Liu, Ruicong Liu, Yun Liu, Dianwen Ng, Zixiong Su, Erwin Wu, Yuhan Wu, Dingkun Yan, Tianyu Yan, Chang Zeng, Bo Zheng, You Zhou

PDF

Open Access

TL;DR

This paper introduces Mio, a comprehensive framework for digital humans with interactive intelligence, combining multimodal embodiment and cognitive reasoning to enable realistic, adaptive, and personality-aligned interactions.

Contribution

The paper presents Mio, an end-to-end multimodal framework with five modules, and establishes a new benchmark for evaluating interactive intelligence in digital humans.

Findings

01

Mio outperforms existing methods in interaction quality.

02

The framework enables personality-aligned expression and self-evolution.

03

Extensive experiments validate superior performance across multiple metrics.

Abstract

We introduce Interactive Intelligence, a novel paradigm of digital human that is capable of personality-aligned expression, adaptive interaction, and self-evolution. To realize this, we present Mio (Multimodal Interactive Omni-Avatar), an end-to-end framework composed of five specialized modules: Thinker, Talker, Face Animator, Body Animator, and Renderer. This unified architecture integrates cognitive reasoning with real-time multimodal embodiment to enable fluid, consistent interaction. Furthermore, we establish a new benchmark to rigorously evaluate the capabilities of interactive intelligence. Extensive experiments demonstrate that our framework achieves superior performance compared to state-of-the-art methods across all evaluated dimensions. Together, these contributions move digital humans beyond superficial imitation toward intelligent interaction.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSocial Robot Interaction and HRI · Human Motion and Animation · Face recognition and analysis