Focus-N-Fix: Region-Aware Fine-Tuning for Text-to-Image Generation

Xiaoying Xing; Avinab Saha; Junfeng He; Susan Hao; Paul Vicol,; Moonkyung Ryu; Gang Li; Sahil Singla; Sarah Young; Yinxiao Li; Feng Yang; and; Deepak Ramachandran

arXiv:2501.06481·cs.CV·January 14, 2025

Focus-N-Fix: Region-Aware Fine-Tuning for Text-to-Image Generation

Xiaoying Xing, Avinab Saha, Junfeng He, Susan Hao, Paul Vicol,, Moonkyung Ryu, Gang Li, Sahil Singla, Sarah Young, Yinxiao Li, Feng Yang, and, Deepak Ramachandran

PDF

TL;DR

Focus-N-Fix is a region-aware fine-tuning method for text-to-image models that improves safety and quality in problematic regions without degrading overall image fidelity.

Contribution

It introduces a novel region-specific fine-tuning approach that enhances localized image quality and safety in T2I models while maintaining global structure.

Findings

01

Significant improvements in safety and plausibility in targeted regions

02

Minimal or no degradation in overall image quality

03

Localized corrections outperform global fine-tuning methods

Abstract

Text-to-image (T2I) generation has made significant advances in recent years, but challenges still remain in the generation of perceptual artifacts, misalignment with complex prompts, and safety. The prevailing approach to address these issues involves collecting human feedback on generated images, training reward models to estimate human feedback, and then fine-tuning T2I models based on the reward models to align them with human preferences. However, while existing reward fine-tuning methods can produce images with higher rewards, they may change model behavior in unexpected ways. For example, fine-tuning for one quality aspect (e.g., safety) may degrade other aspects (e.g., prompt alignment), or may lead to reward hacking (e.g., finding a way to increase rewards without having the intended effect). In this paper, we propose Focus-N-Fix, a region-aware fine-tuning method that trains…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsALIGN