EgoFlow: Gradient-Guided Flow Matching for Egocentric 6DoF Object Motion Generation
Abhishek Saroha, Huajian Zeng, Xingxing Zuo, Daniel Cremers, Xi Wang

TL;DR
EgoFlow is a novel flow-matching framework that generates realistic, physically plausible 6DoF object trajectories from egocentric videos, integrating physical constraints and multimodal observations.
Contribution
It introduces a hybrid architecture combining Mamba, Transformer, and Perceiver models with gradient-guided inference for physically consistent motion synthesis.
Findings
Outperforms diffusion and transformer baselines in accuracy and realism.
Reduces collision rates by up to 79%.
Generalizes well to unseen scenes.
Abstract
Understanding and predicting object motion from egocentric video is fundamental to embodied perception and interaction. However, generating physically consistent 6DoF trajectories remains challenging due to occlusions, fast motion, and the lack of explicit physical reasoning in existing generative models. We present EgoFlow, a flow-matching framework that synthesizes realistic and physically plausible trajectories conditioned on multimodal egocentric observations. EgoFlow employs a hybrid Mamba-Transformer-Perceiver architecture to jointly model temporal dynamics, scene geometry, and semantic intent, while a gradient-guided inference process enforces differentiable physical constraints such as collision avoidance and motion smoothness. This combination yields coherent and controllable motion generation without post-hoc filtering or additional supervision. Experiments on real-world…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
