Can Image-To-Video Models Simulate Pedestrian Dynamics?

Aaron Appelle; Jerome P. Lynch

arXiv:2510.17731·cs.CV·October 21, 2025

Can Image-To-Video Models Simulate Pedestrian Dynamics?

Aaron Appelle, Jerome P. Lynch

PDF

Open Access

TL;DR

This paper explores whether advanced image-to-video diffusion transformer models can accurately simulate pedestrian movements in crowded scenes by conditioning on keyframes and evaluating their trajectory predictions.

Contribution

It introduces a framework for assessing I2V models' ability to generate realistic pedestrian dynamics conditioned on keyframes from benchmark datasets.

Findings

01

I2V models can produce plausible pedestrian trajectories.

02

Quantitative evaluation shows competitive prediction accuracy.

03

Models demonstrate potential for crowd simulation applications.

Abstract

Recent high-performing image-to-video (I2V) models based on variants of the diffusion transformer (DiT) have displayed remarkable inherent world-modeling capabilities by virtue of training on large scale video datasets. We investigate whether these models can generate realistic pedestrian movement patterns in crowded public scenes. Our framework conditions I2V models on keyframes extracted from pedestrian trajectory benchmarks, then evaluates their trajectory prediction performance using quantitative measures of pedestrian dynamics.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsEvacuation and Crowd Dynamics · Anomaly Detection Techniques and Applications · Gait Recognition and Analysis