Autoregressive Flow Matching for Motion Prediction
Johnathan Xie, Stefan Stojanov, Cristobal Eyzaguirre, Daniel L. K. Yamins, Jiajun Wu

TL;DR
This paper introduces autoregressive flow matching (ARFM), a novel probabilistic model trained on diverse video datasets to accurately predict complex human and robot motions over long horizons, improving downstream task performance.
Contribution
The paper presents ARFM, a new autoregressive flow matching method for probabilistic motion prediction, inspired by large-scale video generation techniques, with benchmarks for human and robot motion prediction.
Findings
ARFM effectively predicts complex motions over long horizons.
Conditioning on predicted future tracks enhances downstream task performance.
Code and models are publicly available for reproducibility.
Abstract
Motion prediction has been studied in different contexts with models trained on narrow distributions and applied to downstream tasks in human motion prediction and robotics. Simultaneously, recent efforts in scaling video prediction have demonstrated impressive visual realism, yet they struggle to accurately model complex motions despite massive scale. Inspired by the scaling of video generation, we develop autoregressive flow matching (ARFM), a new method for probabilistic modeling of sequential continuous data and train it on diverse video datasets to generate future point track locations over long horizons. To evaluate our model, we develop benchmarks for evaluating the ability of motion prediction models to predict human and robot motion. Our model is able to predict complex motions, and we demonstrate that conditioning robot action prediction and human motion prediction on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Human Motion and Animation · Generative Adversarial Networks and Image Synthesis
