Contrastive-SDXL: Annotation-Preserving Night-Time Augmentation for Pedestrian Detection
Franky George, Muhammad Khalid, Adil Khan

TL;DR
Contrastive-SDXL introduces a semantic-preserving night-time augmentation method using diffusion models, significantly improving pedestrian detection accuracy in low-light conditions by generating realistic synthetic images.
Contribution
The paper proposes a novel contrastive diffusion-based augmentation framework that maintains semantic consistency, enhancing night-time pedestrian detection performance.
Findings
Synthetic night-time images achieve a low FID of 22.5.
Detectors trained with synthetic images reduce miss rate by 6-7%.
Approaches real performance of real night-time data training.
Abstract
Night-time pedestrian detection remains challenging because labelled night-time data are limited and large illumination differences make daytime-only trained detectors unreliable. Latent diffusion models (LDMs) provide a powerful basis for image-to-image translation and cross-domain augmentation, but their effectiveness in safety-critical perception depends on whether detector-relevant objects and local semantic structure are preserved when translating between source and target domains. In this work, we present Contrastive-SDXL, a day-to-night augmentation framework for night-time pedestrian detection built on SDXL-Turbo and fine-tuned using Low-Rank Adaptation (LoRA). To preserve semantic correspondence between daytime inputs and translated night-time images, we introduce a patch-wise semantic contrastive loss guided by a pretrained DINOv2 encoder rather than generator encoder…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
