Contrastive-SDXL: Annotation-Preserving Night-Time Augmentation for Pedestrian Detection

Franky George; Muhammad Khalid; Adil Khan

arXiv:2605.16406·cs.CV·May 19, 2026

Contrastive-SDXL: Annotation-Preserving Night-Time Augmentation for Pedestrian Detection

Franky George, Muhammad Khalid, Adil Khan

PDF

TL;DR

Contrastive-SDXL introduces a semantic-preserving night-time augmentation method using diffusion models, significantly improving pedestrian detection accuracy in low-light conditions by generating realistic synthetic images.

Contribution

The paper proposes a novel contrastive diffusion-based augmentation framework that maintains semantic consistency, enhancing night-time pedestrian detection performance.

Findings

01

Synthetic night-time images achieve a low FID of 22.5.

02

Detectors trained with synthetic images reduce miss rate by 6-7%.

03

Approaches real performance of real night-time data training.

Abstract

Night-time pedestrian detection remains challenging because labelled night-time data are limited and large illumination differences make daytime-only trained detectors unreliable. Latent diffusion models (LDMs) provide a powerful basis for image-to-image translation and cross-domain augmentation, but their effectiveness in safety-critical perception depends on whether detector-relevant objects and local semantic structure are preserved when translating between source and target domains. In this work, we present Contrastive-SDXL, a day-to-night augmentation framework for night-time pedestrian detection built on SDXL-Turbo and fine-tuned using Low-Rank Adaptation (LoRA). To preserve semantic correspondence between daytime inputs and translated night-time images, we introduce a patch-wise semantic contrastive loss guided by a pretrained DINOv2 encoder rather than generator encoder…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.