CogRail: Benchmarking VLMs in Cognitive Intrusion Perception for Intelligent Railway Transportation Systems
Yonglin Tian, Qiyao Zhang, Wei Xu, Yutong Wang, Yihao Wu, Xinyi Li, Xingyuan Dai, Hui Zhang, Zhiyong Cui, Baoqing Guo, Zujun Yu, and Yisheng Lv

TL;DR
CogRail introduces a benchmark and a joint fine-tuning framework for visual-language models to improve cognitive intrusion perception in railway safety, emphasizing spatial-temporal reasoning and domain-specific adaptation.
Contribution
The paper presents a new benchmark, CogRail, and a multi-task fine-tuning framework to enhance VLMs for complex safety-critical spatial-temporal reasoning tasks in railway systems.
Findings
Current VLMs struggle with spatial-temporal reasoning in intrusion detection.
Joint fine-tuning improves model performance and interpretability.
Structured multi-task learning benefits domain-specific safety applications.
Abstract
Accurate and early perception of potential intrusion targets is essential for ensuring the safety of railway transportation systems. However, most existing systems focus narrowly on object classification within fixed visual scopes and apply rule-based heuristics to determine intrusion status, often overlooking targets that pose latent intrusion risks. Anticipating such risks requires the cognition of spatial context and temporal dynamics for the object of interest (OOI), which presents challenges for conventional visual models. To facilitate deep intrusion perception, we introduce a novel benchmark, CogRail, which integrates curated open-source datasets with cognitively driven question-answer annotations to support spatio-temporal reasoning and prediction. Building upon this benchmark, we conduct a systematic evaluation of state-of-the-art visual-language models (VLMs) using multimodal…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAnomaly Detection Techniques and Applications · Multimodal Machine Learning Applications · Data Visualization and Analytics
