InterMoE: Individual-Specific 3D Human Interaction Generation via Dynamic Temporal-Selective MoE

Lipeng Wang; Hongxing Fan; Haohua Chen; Zehuan Huang; Lu Sheng

arXiv:2511.13488·cs.CV·November 18, 2025

InterMoE: Individual-Specific 3D Human Interaction Generation via Dynamic Temporal-Selective MoE

Lipeng Wang, Hongxing Fan, Haohua Chen, Zehuan Huang, Lu Sheng

PDF

Open Access

TL;DR

InterMoE is a new framework that generates personalized 3D human interactions by dynamically focusing on critical motion features, achieving state-of-the-art results in fidelity and individual identity preservation.

Contribution

It introduces a Dynamic Temporal-Selective MoE with a routing mechanism that combines text semantics and motion context for personalized interaction generation.

Findings

01

Reduces FID scores by 9% on InterHuman dataset.

02

Reduces FID scores by 22% on InterX dataset.

03

Achieves state-of-the-art performance in individual-specific 3D human interaction generation.

Abstract

Generating high-quality human interactions holds significant value for applications like virtual reality and robotics. However, existing methods often fail to preserve unique individual characteristics or fully adhere to textual descriptions. To address these challenges, we introduce InterMoE, a novel framework built on a Dynamic Temporal-Selective Mixture of Experts. The core of InterMoE is a routing mechanism that synergistically uses both high-level text semantics and low-level motion context to dispatch temporal motion features to specialized experts. This allows experts to dynamically determine the selection capacity and focus on critical temporal features, thereby preserving specific individual characteristic identities while ensuring high semantic fidelity. Extensive experiments show that InterMoE achieves state-of-the-art performance in individual-specific high-fidelity 3D human…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Human Motion and Animation · Generative Adversarial Networks and Image Synthesis