Class-specific diffusion models improve military object detection in a low-data domain

Ella P. Fokkinga; Jan Erik van Woerden; Thijs A. Eker; Sebastiaan P. Snel; Elfi I.S. Hofmeijer; Klamer Schutte; Friso G. Heslinga

arXiv:2604.18076·cs.CV·April 21, 2026

Class-specific diffusion models improve military object detection in a low-data domain

Ella P. Fokkinga, Jan Erik van Woerden, Thijs A. Eker, Sebastiaan P. Snel, Elfi I.S. Hofmeijer, Klamer Schutte, Friso G. Heslinga

PDF

TL;DR

This paper demonstrates that class-specific diffusion models can significantly enhance military vehicle detection accuracy in low-data scenarios, serving as an effective alternative to traditional data collection methods.

Contribution

It introduces a method to fine-tune diffusion models for class-specific military object generation and shows their effectiveness in improving detection performance with minimal real data.

Findings

01

Diffusion-generated images improved detection performance up to +8.0% mAP$_{50}$ with 8 real samples.

02

Structural guidance with ControlNet further enhanced performance in low-data regimes.

03

Synthetic data can replace traditional simulation pipelines for military AI training.

Abstract

Diffusion-based image synthesis has emerged as a promising source of synthetic training data for AI-based object detection and classification. In this work, we investigate whether images generated with diffusion can improve military vehicle detection under low-data conditions. We fine-tuned the text-to-image diffusion model FLUX.1 [dev] using LoRA with only 8 or 24 real images per class across 15 vehicle categories, resulting in class-specific diffusion models, which were used to generate new samples from automatically generated text prompts. The same real images were used to fine-tune the RF-DETR detector for a 15-class object detection task. Synthetic datasets generated by the diffusion models were then used to further improve detector performance. Importantly, no additional real data was required, as the generative models leveraged the same limited training samples. FLUX-generated…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.