ARMFlow: AutoRegressive MeanFlow for Online 3D Human Reaction Generation
Zichen Geng, Zeeshan Hayder, Wei Liu, Hesheng Wang, and Ajmal Mian

TL;DR
ARMFlow is a novel autoregressive framework for online 3D human reaction generation that achieves high fidelity, real-time inference, and adaptability by modeling temporal dependencies and reducing error accumulation.
Contribution
It introduces ARMFlow with Bootstrap Contextual Encoding and a causal encoder, enabling online generation with improved accuracy and low latency, surpassing existing methods.
Findings
30% improvement in FID over existing online methods
State-of-the-art offline performance with ReMFlow
Real-time, high-fidelity online generation
Abstract
3D human reaction generation faces three main challenges:(1) high motion fidelity, (2) real-time inference, and (3) autoregressive adaptability for online scenarios. Existing methods fail to meet all three simultaneously. We propose ARMFlow, a MeanFlow-based autoregressive framework that models temporal dependencies between actor and reactor motions. It consists of a causal context encoder and an MLP-based velocity predictor. We introduce Bootstrap Contextual Encoding (BSCE) in training, encoding generated history instead of the ground-truth ones, to alleviate error accumulation in autoregressive generation. We further introduce the offline variant ReMFlow, achieving state-of-the-art performance with the fastest inference among offline methods. Our ARMFlow addresses key limitations of online settings by: (1) enhancing semantic alignment via a global contextual encoder; (2) achieving…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Motion and Animation · Human Pose and Action Recognition · Robot Manipulation and Learning
