Can a Second-View Image Be a Language? Geometric and Semantic Cross-Modal Reasoning for X-ray Prohibited Item Detection
Chuang Peng, Renshuai Tao, Zhongwei Ren, Xianglong Liu, Yunchao Wei

TL;DR
This paper introduces a new benchmark and a multimodal model that treats second-view X-ray images as a language-like modality, enhancing prohibited item detection through cross-view and cross-modal reasoning.
Contribution
It presents DualXrayBench, a comprehensive benchmark with dual-view images and captions, and proposes GSR, a model that leverages cross-view geometry and semantics as a language-like modality for improved detection.
Findings
GSR significantly outperforms existing methods on X-ray detection tasks.
DualXrayBench provides a new dataset and evaluation framework for multi-view X-ray analysis.
Treating second-view images as a language-like modality improves reasoning capabilities.
Abstract
Automatic X-ray prohibited items detection is vital for security inspection and has been widely studied. Traditional methods rely on visual modality, often struggling with complex threats. While recent studies incorporate language to guide single-view images, human inspectors typically use dual-view images in practice. This raises the question: can the second view provide constraints similar to a language modality? In this work, we introduce DualXrayBench, the first comprehensive benchmark for X-ray inspection that includes multiple views and modalities. It supports eight tasks designed to test cross-view reasoning. In DualXrayBench, we introduce a caption corpus consisting of 45,613 dual-view image pairs across 12 categories with corresponding captions. Building upon these data, we propose the Geometric (cross-view)-Semantic (cross-modality) Reasoner (GSR), a multimodal model that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Advanced Neural Network Applications · Radiology practices and education
