Text-guided Controllable Diffusion for Realistic Camouflage Images Generation
Yuhang Qian, Haiyan Chen, Wentong Li, Ningzhong Liu, Jie Qin

TL;DR
This paper introduces CT-CIG, a novel controllable diffusion-based method guided by text prompts, to generate realistic, logically consistent camouflage images with high visual fidelity and complex patterns.
Contribution
The paper presents a new text-guided camouflage image generation framework that leverages large visual language models and a frequency refinement module for improved realism and controllability.
Findings
Generated images show high semantic alignment with text prompts.
CT-CIG produces photorealistic camouflage images with complex patterns.
Experimental results demonstrate superior camouflage effectiveness.
Abstract
Camouflage Images Generation (CIG) is an emerging research area that focuses on synthesizing images in which objects are harmoniously blended and exhibit high visual consistency with their surroundings. Existing methods perform CIG by either fusing objects into specific backgrounds or outpainting the surroundings via foreground object-guided diffusion. However, they often fail to obtain natural results because they overlook the logical relationship between camouflaged objects and background environments. To address this issue, we propose CT-CIG, a Controllable Text-guided Camouflage Images Generation method that produces realistic and logically plausible camouflage images. Leveraging Large Visual Language Models (VLM), we design a Camouflage-Revealing Dialogue Mechanism (CRDM) to annotate existing camouflage datasets with high-quality text prompts. Subsequently, the constructed…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVisual Attention and Saliency Detection · Generative Adversarial Networks and Image Synthesis · Multimodal Machine Learning Applications
