InterEdit: Navigating Text-Guided Multi-Human 3D Motion Editing

Yebin Yang; Di Wen; Lei Qi; Weitong Kong; Junwei Zheng; Ruiping Liu; Yufan Chen; Chengzhi Wu; Kailun Yang; Yuqian Fu; Danda Pani Paudel; Luc Van Gool; Kunyu Peng

arXiv:2603.13082·cs.CV·March 16, 2026

InterEdit: Navigating Text-Guided Multi-Human 3D Motion Editing

Yebin Yang, Di Wen, Lei Qi, Weitong Kong, Junwei Zheng, Ruiping Liu, Yufan Chen, Chengzhi Wu, Kailun Yang, Yuqian Fu, Danda Pani Paudel, Luc Van Gool, Kunyu Peng

PDF

Open Access

TL;DR

This paper introduces InterEdit3D, a new dataset and benchmark for multi-person 3D motion editing guided by text, along with a novel diffusion model that captures interaction cues and motion dynamics, advancing multi-human motion editing capabilities.

Contribution

The paper presents a new dataset, benchmark, and a diffusion-based model for multi-person 3D motion editing guided by text, addressing the complexity of inter-person interactions.

Findings

01

InterEdit improves text-to-motion consistency.

02

InterEdit achieves state-of-the-art performance in multi-human motion editing.

03

The dataset and code are publicly released.

Abstract

Text-guided 3D motion editing has seen success in single-person scenarios, but its extension to multi-person settings is less explored due to limited paired data and the complexity of inter-person interactions. We introduce the task of multi-person 3D motion editing, where a target motion is generated from a source and a text instruction. To support this, we propose InterEdit3D, a new dataset with manual two-person motion change annotations, and a Text-guided Multi-human Motion Editing (TMME) benchmark. We present InterEdit, a synchronized classifier-free conditional diffusion model for TMME. It introduces Semantic-Aware Plan Token Alignment with learnable tokens to capture high-level interaction cues and an Interaction-Aware Frequency Token Alignment strategy using DCT and energy pooling to model periodic motion dynamics. Experiments show that InterEdit improves text-to-motion…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Motion and Animation · 3D Shape Modeling and Analysis · Human Pose and Action Recognition