Evaluating Video Models as Simulators of Multi-Person Pedestrian Trajectories

Aaron Appelle; Jerome P. Lynch

arXiv:2510.20182·cs.CV·October 24, 2025

Evaluating Video Models as Simulators of Multi-Person Pedestrian Trajectories

Aaron Appelle, Jerome P. Lynch

PDF

Open Access

TL;DR

This paper evaluates how well current text-to-video and image-to-video models simulate realistic multi-person pedestrian trajectories, revealing their strengths and limitations in modeling complex multi-agent dynamics.

Contribution

It introduces a novel evaluation protocol and a method to reconstruct pedestrian trajectories from generated videos, enabling systematic benchmarking of multi-agent behavior in video models.

Findings

01

Models learn effective priors for multi-agent plausibility

02

Failure modes include merging and disappearing pedestrians

03

Benchmarking reveals strengths and areas for improvement

Abstract

Large-scale video generation models have demonstrated high visual realism in diverse contexts, spurring interest in their potential as general-purpose world simulators. Existing benchmarks focus on individual subjects rather than scenes with multiple interacting people. However, the plausibility of multi-agent dynamics in generated videos remains unverified. We propose a rigorous evaluation protocol to benchmark text-to-video (T2V) and image-to-video (I2V) models as implicit simulators of pedestrian dynamics. For I2V, we leverage start frames from established datasets to enable comparison with a ground truth video dataset. For T2V, we develop a prompt suite to explore diverse pedestrian densities and interactions. A key component is a method to reconstruct 2D bird's-eye view trajectories from pixel-space without known camera parameters. Our analysis reveals that leading models have…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Autonomous Vehicle Technology and Safety · Evacuation and Crowd Dynamics