Is VLA Reasoning Faithful? Probing Safety of Chain-of-Causation
Nicanor Mayumu, Xiaoheng Deng, Patrick Mukala

TL;DR
This study evaluates the faithfulness of reasoning in Vision-Language-Action driving models, revealing significant issues with trajectory accuracy, robustness, and consistency across diverse scenarios.
Contribution
It provides the first systematic analysis of faithfulness in VLA driving models, introduces formal definitions of faithfulness, and proposes a safety architecture based on these insights.
Findings
Reasoning fidelity is only 42.5% overall.
High trajectory fragility under visual perturbations (97.7%).
Low reasoning-action consistency (48.3%).
Abstract
We present the first systematic study of faithfulness in Vision-Language-Action (VLA) driving models, analyzing 300 Alpamayo-R1-10B inferences across 100 diverse PhysicalAI-AV scenarios. Our main finding is that output natural-language rationales with trajectories may be significantly unfaithful: (i) overall reasoning fidelity is only 42.5%, with Chain-of-Causation matching scene reality less than half the time; (ii) 94 missed pedestrians in one-third of pedestrian-relevant scenes; (iii) 97.7% trajectory fragility under mild visual perturbations; and (iv) only 48.3% mean reasoning-action consistency, with 53.3% of inferences exhibiting low consistency, including 37.9% of stop-claimed cases where the model continues instead. We formalize faithfulness information-theoretically, define entity and action fidelity with verification criteria, and outline a four-component safety architecture…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
