EgoDyn-Bench: Evaluating Ego-Motion Understanding in Vision-Centric Foundation Models for Autonomous Driving

Finn Rasmus Sch\"afer; Yuan Gao; Dingrui Wang; Thomas Stauner; Stephan G\"unnemann; Mattia Piccinini; Sebastian Schmidt; Johannes Betz

arXiv:2604.22851·cs.CV·April 28, 2026

EgoDyn-Bench: Evaluating Ego-Motion Understanding in Vision-Centric Foundation Models for Autonomous Driving

Finn Rasmus Sch\"afer, Yuan Gao, Dingrui Wang, Thomas Stauner, Stephan G\"unnemann, Mattia Piccinini, Sebastian Schmidt, Johannes Betz

PDF

TL;DR

EgoDyn-Bench is a diagnostic benchmark that evaluates how well vision-centric foundation models understand ego-motion physics in autonomous driving, revealing a perception bottleneck and the importance of explicit trajectory encodings.

Contribution

The paper introduces EgoDyn-Bench, a new benchmark for assessing ego-motion understanding, and uncovers a structural perception bottleneck in current models.

Findings

01

Models struggle to align physical concepts with visual observations.

02

Explicit trajectory encodings improve physical consistency across models.

03

Egomotion logic is mainly derived from language, not visual data.

Abstract

While Vision-Language Models (VLMs) have advanced highlevel reasoning in autonomous driving, their ability to ground this reasoning in the underlying physics of ego-motion remains poorly understood. We introduce EgoDyn-Bench, a diagnostic benchmark for evaluating the semantic ego-motion understanding of vision-centric foundation models. By mapping continuous vehicle kinematics to discrete motion concepts via a deterministic oracle, we decouple a model's internal physical logic from its visual perception. Our large-scale empirical audit spanning 20 + models, including closed-source MLLMs, open-source VLMs across multiple scales, and specialized VLAs, identifies a significant Perception Bottleneck: while models exhibit logical physical concepts, they consistently fail to accurately align them with visual observations, frequently underperforming classical non-learned geometric baselines.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.