FreeCond: Free Lunch in the Input Conditions of Text-Guided Inpainting

Teng-Fang Hsiao; Bo-Kai Ruan; Sung-Lin Tsai; Yi-Lun Wu; Hong-Han Shuai

arXiv:2412.00427·cs.CV·December 3, 2024

FreeCond: Free Lunch in the Input Conditions of Text-Guided Inpainting

Teng-Fang Hsiao, Bo-Kai Ruan, Sung-Lin Tsai, Yi-Lun Wu, Hong-Han Shuai

PDF

Open Access

TL;DR

FreeCond is a method that improves text-guided inpainting by adjusting input conditions, aligning model attention with user prompts, and significantly enhancing output quality without extra computation.

Contribution

The paper introduces FreeCond, a novel approach that modifies only input conditions to address training bias in SDI, improving inpainting quality especially with complex or deviating prompts.

Findings

01

Up to 60% improvement in CLIP score for SDI models.

02

Effective enhancement of inpainting quality without additional computation.

03

Applicable to various SDI-based models.

Abstract

In this study, we aim to determine and solve the deficiency of Stable Diffusion Inpainting (SDI) in following the instruction of both prompt and mask. Due to the training bias from masking, the inpainting quality is hindered when the prompt instruction and image condition are not related. Therefore, we conduct a detailed analysis of the internal representations learned by SDI, focusing on how the mask input influences the cross-attention layer. We observe that adapting text key tokens toward the input mask enables the model to selectively paint within the given area. Leveraging these insights, we propose FreeCond, which adjusts only the input mask condition and image condition. By increasing the latent mask value and modifying the frequency of image condition, we align the cross-attention features with the model's training bias to improve generation quality without additional…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Digital Humanities and Scholarship · Music Technology and Sound Studies