Taming Generative Synthetic Data for X-ray Prohibited Item Detection
Jialong Sun, Hongguang Zhu, Weizhe Liu, Yunda Sun, Renshuai Tao, Yunchao Wei

TL;DR
This paper introduces Xsyn, a one-stage text-to-image synthesis method for generating high-quality X-ray security images to improve prohibited item detection, reducing labor costs and outperforming previous approaches.
Contribution
Xsyn is the first one-stage synthesis pipeline that enhances X-ray image generation without extra labor, using cross-attention refinement and background occlusion modeling.
Findings
Xsyn achieves 1.2% higher mAP than previous methods.
Synthetic images improve detection performance across datasets.
The method reduces labor costs in data augmentation.
Abstract
Training prohibited item detection models requires a large amount of X-ray security images, but collecting and annotating these images is time-consuming and laborious. To address data insufficiency, X-ray security image synthesis methods composite images to scale up datasets. However, previous methods primarily follow a two-stage pipeline, where they implement labor-intensive foreground extraction in the first stage and then composite images in the second stage. Such a pipeline introduces inevitable extra labor cost and is not efficient. In this paper, we propose a one-stage X-ray security image synthesis pipeline (Xsyn) based on text-to-image generation, which incorporates two effective strategies to improve the usability of synthetic images. The Cross-Attention Refinement (CAR) strategy leverages the cross-attention map from the diffusion model to refine the bounding box annotation.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Advanced Neural Network Applications · Multimodal Machine Learning Applications
