Autoregressive Flow Matching for Motion Prediction

Johnathan Xie; Stefan Stojanov; Cristobal Eyzaguirre; Daniel L. K. Yamins; Jiajun Wu

arXiv:2512.22688·cs.CV·December 30, 2025

Autoregressive Flow Matching for Motion Prediction

Johnathan Xie, Stefan Stojanov, Cristobal Eyzaguirre, Daniel L. K. Yamins, Jiajun Wu

PDF

Open Access

TL;DR

This paper introduces autoregressive flow matching (ARFM), a novel probabilistic model trained on diverse video datasets to accurately predict complex human and robot motions over long horizons, improving downstream task performance.

Contribution

The paper presents ARFM, a new autoregressive flow matching method for probabilistic motion prediction, inspired by large-scale video generation techniques, with benchmarks for human and robot motion prediction.

Findings

01

ARFM effectively predicts complex motions over long horizons.

02

Conditioning on predicted future tracks enhances downstream task performance.

03

Code and models are publicly available for reproducibility.

Abstract

Motion prediction has been studied in different contexts with models trained on narrow distributions and applied to downstream tasks in human motion prediction and robotics. Simultaneously, recent efforts in scaling video prediction have demonstrated impressive visual realism, yet they struggle to accurately model complex motions despite massive scale. Inspired by the scaling of video generation, we develop autoregressive flow matching (ARFM), a new method for probabilistic modeling of sequential continuous data and train it on diverse video datasets to generate future point track locations over long horizons. To evaluate our model, we develop benchmarks for evaluating the ability of motion prediction models to predict human and robot motion. Our model is able to predict complex motions, and we demonstrate that conditioning robot action prediction and human motion prediction on…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Human Motion and Animation · Generative Adversarial Networks and Image Synthesis