Negative-Guided Subject Fidelity Optimization for Zero-Shot Subject-Driven Generation

Chaehun Shin; Jooyoung Choi; Johan Barthelemy; Jungbeom Lee; Sungroh Yoon

arXiv:2506.03621·cs.CV·October 1, 2025

Negative-Guided Subject Fidelity Optimization for Zero-Shot Subject-Driven Generation

Chaehun Shin, Jooyoung Choi, Johan Barthelemy, Jungbeom Lee, Sungroh Yoon

PDF

Open Access 3 Reviews

TL;DR

This paper introduces Subject Fidelity Optimization (SFO), a new framework that improves zero-shot subject-driven image generation by using synthetic negatives and reweighted diffusion steps, leading to better subject detail preservation.

Contribution

The paper proposes SFO with Condition-Degradation Negative Sampling (CDNS) and timestep reweighting, advancing zero-shot subject-driven generation by explicitly guiding model focus on subject fidelity.

Findings

01

SFO with CDNS outperforms recent baselines in subject fidelity.

02

Reweighted diffusion timesteps enhance fine-grained subject detail learning.

03

Synthetic negatives improve the model's ability to distinguish subjects without human annotations.

Abstract

We present Subject Fidelity Optimization (SFO), a novel comparative learning framework for zero-shot subject-driven generation that enhances subject fidelity. Existing supervised fine-tuning methods, which rely only on positive targets and use the diffusion loss as in the pre-training stage, often fail to capture fine-grained subject details. To address this, SFO introduces additional synthetic negative targets and explicitly guides the model to favor positives over negatives through pairwise comparison. For negative targets, we propose Condition-Degradation Negative Sampling (CDNS), which automatically produces synthetic negatives tailored for subject-driven generation by introducing controlled degradations that emphasize subject fidelity and text alignment without expensive human annotations. Moreover, we reweight the diffusion timesteps to focus fine-tuning on intermediate steps…

Peer Reviews

Decision·Submitted to ICLR 2026

Reviewer 01Rating 8Confidence 4

Strengths

Well-written and easy to read and follow. A simple, well-designed, and effective method pipeline — particularly, the Condition-Degradation Negative Sampling pipeline is a strong design choice. Strong experimental results.

Weaknesses

Testing on more benchmarks could further strengthen the evidence for the method’s effectiveness. No other clear weaknesses.

Reviewer 02Rating 4Confidence 4

Strengths

- This paper is well-written and easy to follow. - The method proposed in the paper is model-agnostic, meaning it can be applied for post-training on any base model to enhance subject fidelity.

Weaknesses

- The CDNS presented in this paper demonstrates the ability to generate "negative" samples with significantly degraded quality. However, this raises concerns. As shown in Figure 3, some negative samples exhibit an excessive discrepancy in fidelity compared to the reference images and show little to no relevance to the text prompt. Such "positive-negative" samples may not provide effective learning signals for the model. This issue warrants further investigation. - The proposed SFO training fra

Reviewer 03Rating 4Confidence 4

Strengths

- **Comparative fine-tuning formulation:** The paper formalizes a pairwise optimization framework (SFO) that introduces explicit comparison between positive and negative targets, aligning with recent trends in preference-based optimization for diffusion models. - **Automatic negative construction:** The proposed CDNS procedure provides a simple yet systematic way to synthesize negative samples by degrading visual and textual conditions, avoiding the need for manual annotation. - **Comprehensiv

Weaknesses

**Lack of Novelty:** The proposed SFO framework lacks clear novelty, as applying DPO-style preference optimization to diffusion models has already been explored in prior works such as Wallace (2024) and related studies. The current formulation does not introduce fundamentally new theoretical insights or algorithmic mechanisms beyond these existing diffusion-based DPO approaches. The paper should clarify how SFO differs from and improves upon these earlier methods. **Limited Performance:** Altho

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Generative Adversarial Networks and Image Synthesis · Multimodal Machine Learning Applications