InterMamba: Efficient Human-Human Interaction Generation with Adaptive Spatio-Temporal Mamba
Zizhao Wu, Yingying Sun, Yiming Chen, Xiaoling Gu, Ruyu Liu, Jiazhou Chen

TL;DR
InterMamba introduces an adaptive spatio-temporal framework for human-human interaction generation, significantly improving efficiency and accuracy over existing transformer-based methods by utilizing parallel SSM branches and adaptive modules.
Contribution
The paper presents a novel, efficient Mamba-based approach with adaptive mechanisms for capturing long-term dependencies in human interactions, outperforming prior transformer-based models.
Findings
Achieves state-of-the-art results on interaction datasets.
Reduces model size to 66M parameters, 36% of InterGen.
Speeds up inference to 0.57 seconds, 46% faster than InterGen.
Abstract
Human-human interaction generation has garnered significant attention in motion synthesis due to its vital role in understanding humans as social beings. However, existing methods typically rely on transformer-based architectures, which often face challenges related to scalability and efficiency. To address these issues, we propose a novel, efficient human-human interaction generation method based on the Mamba framework, designed to meet the demands of effectively capturing long-sequence dependencies while providing real-time feedback. Specifically, we introduce an adaptive spatio-temporal Mamba framework that utilizes two parallel SSM branches with an adaptive mechanism to integrate the spatial and temporal features of motion sequences. To further enhance the model's ability to capture dependencies within individual motion sequences and the interactions between different individual…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Human Motion and Animation · Hand Gesture Recognition Systems
