Inference-Time Scaling of Diffusion Models for Infrared Data Generation
Kai A. Horstmann, Maxim Clouser, Kia Khezeli

TL;DR
This paper presents an inference-time guidance method using a domain-adapted CLIP verifier to improve infrared image generation quality with diffusion models, addressing data scarcity issues in infrared domain adaptation.
Contribution
It introduces a novel inference-time scaling approach with a CLIP-based verifier to enhance infrared image generation quality using diffusion models.
Findings
10% reduction in FID scores on KAIST dataset
Improved alignment of generated images with text prompts
Effective guidance in low-data infrared settings
Abstract
Infrared imagery enables temperature-based scene understanding using passive sensors, particularly under conditions of low visibility where traditional RGB imaging fails. Yet, developing downstream vision models for infrared applications is hindered by the scarcity of high-quality annotated data, due to the specialized expertise required for infrared annotation. While synthetic infrared image generation has the potential to accelerate model development by providing large-scale, diverse training data, training foundation-level generative diffusion models in the infrared domain has remained elusive due to limited datasets. In light of such data constraints, we explore an inference-time scaling approach using a domain-adapted CLIP-based verifier for enhanced infrared image generation quality. We adapt FLUX.1-dev, a state-of-the-art text-to-image diffusion model, to the infrared domain by…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Advanced Neural Network Applications · Face recognition and analysis
