RailVQA: A Benchmark and Framework for Efficient Interpretable Visual Cognition in Automatic Train Operation

Sen Zhang; Runmei Li; Shizhuang Deng; Zhichao Zheng; Yuhe Zhang; Jiani Li; Kailun Zhang; Tao Zhang; Wenjun Wu; Qunbo Wang

arXiv:2603.27112·cs.CV·April 24, 2026

RailVQA: A Benchmark and Framework for Efficient Interpretable Visual Cognition in Automatic Train Operation

Sen Zhang, Runmei Li, Shizhuang Deng, Zhichao Zheng, Yuhe Zhang, Jiani Li, Kailun Zhang, Tao Zhang, Wenjun Wu, Qunbo Wang

PDF

1 Repo

TL;DR

RailVQA introduces a new benchmark and framework for interpretable visual cognition in autonomous train operation, addressing safety-critical perception and reasoning challenges with efficient, generalizable models.

Contribution

It presents RailVQA-bench, a comprehensive VQA dataset for railway scenarios, and RailVQA-CoM, a collaborative model framework combining small and large models for better efficiency and cognition.

Findings

01

Significant performance improvements in visual perception and reasoning tasks.

02

Enhanced interpretability and efficiency in autonomous train systems.

03

Better cross-domain generalization demonstrated through experiments.

Abstract

As Automatic Train Operation (ATO) advances toward GoA4 and beyond, it increasingly depends on efficient, reliable cab-view visual perception and decision-oriented inference to ensure safe operation in complex and dynamic railway environments. However, existing approaches focus primarily on basic perception and often generalize poorly to rare yet safety-critical corner cases. They also lack the high-level reasoning and planning capabilities required for operational decision-making. Although recent Large Multi-modal Models (LMMs) show strong generalization and cognitive capabilities, their use in safety-critical ATO is hindered by high computational cost and hallucination risk. Meanwhile, reliable domain-specific benchmarks for systematically evaluating cognitive capabilities are still lacking. To address these gaps, we introduce RailVQA-bench, the first VQA benchmark for cab-view visual…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

https://cybereye-bjtu.github.io/RailVQA.html
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.