Text-guided Controllable Diffusion for Realistic Camouflage Images Generation

Yuhang Qian; Haiyan Chen; Wentong Li; Ningzhong Liu; Jie Qin

arXiv:2511.20218·cs.CV·November 26, 2025

Text-guided Controllable Diffusion for Realistic Camouflage Images Generation

Yuhang Qian, Haiyan Chen, Wentong Li, Ningzhong Liu, Jie Qin

PDF

Open Access

TL;DR

This paper introduces CT-CIG, a novel controllable diffusion-based method guided by text prompts, to generate realistic, logically consistent camouflage images with high visual fidelity and complex patterns.

Contribution

The paper presents a new text-guided camouflage image generation framework that leverages large visual language models and a frequency refinement module for improved realism and controllability.

Findings

01

Generated images show high semantic alignment with text prompts.

02

CT-CIG produces photorealistic camouflage images with complex patterns.

03

Experimental results demonstrate superior camouflage effectiveness.

Abstract

Camouflage Images Generation (CIG) is an emerging research area that focuses on synthesizing images in which objects are harmoniously blended and exhibit high visual consistency with their surroundings. Existing methods perform CIG by either fusing objects into specific backgrounds or outpainting the surroundings via foreground object-guided diffusion. However, they often fail to obtain natural results because they overlook the logical relationship between camouflaged objects and background environments. To address this issue, we propose CT-CIG, a Controllable Text-guided Camouflage Images Generation method that produces realistic and logically plausible camouflage images. Leveraging Large Visual Language Models (VLM), we design a Camouflage-Revealing Dialogue Mechanism (CRDM) to annotate existing camouflage datasets with high-quality text prompts. Subsequently, the constructed…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsVisual Attention and Saliency Detection · Generative Adversarial Networks and Image Synthesis · Multimodal Machine Learning Applications