Negative-Guided Subject Fidelity Optimization for Zero-Shot Subject-Driven Generation
Chaehun Shin, Jooyoung Choi, Johan Barthelemy, Jungbeom Lee, Sungroh Yoon

TL;DR
This paper introduces Subject Fidelity Optimization (SFO), a new framework that improves zero-shot subject-driven image generation by using synthetic negatives and reweighted diffusion steps, leading to better subject detail preservation.
Contribution
The paper proposes SFO with Condition-Degradation Negative Sampling (CDNS) and timestep reweighting, advancing zero-shot subject-driven generation by explicitly guiding model focus on subject fidelity.
Findings
SFO with CDNS outperforms recent baselines in subject fidelity.
Reweighted diffusion timesteps enhance fine-grained subject detail learning.
Synthetic negatives improve the model's ability to distinguish subjects without human annotations.
Abstract
We present Subject Fidelity Optimization (SFO), a novel comparative learning framework for zero-shot subject-driven generation that enhances subject fidelity. Existing supervised fine-tuning methods, which rely only on positive targets and use the diffusion loss as in the pre-training stage, often fail to capture fine-grained subject details. To address this, SFO introduces additional synthetic negative targets and explicitly guides the model to favor positives over negatives through pairwise comparison. For negative targets, we propose Condition-Degradation Negative Sampling (CDNS), which automatically produces synthetic negatives tailored for subject-driven generation by introducing controlled degradations that emphasize subject fidelity and text alignment without expensive human annotations. Moreover, we reweight the diffusion timesteps to focus fine-tuning on intermediate steps…
Peer Reviews
Decision·Submitted to ICLR 2026
Well-written and easy to read and follow. A simple, well-designed, and effective method pipeline — particularly, the Condition-Degradation Negative Sampling pipeline is a strong design choice. Strong experimental results.
Testing on more benchmarks could further strengthen the evidence for the method’s effectiveness. No other clear weaknesses.
- This paper is well-written and easy to follow. - The method proposed in the paper is model-agnostic, meaning it can be applied for post-training on any base model to enhance subject fidelity.
- The CDNS presented in this paper demonstrates the ability to generate "negative" samples with significantly degraded quality. However, this raises concerns. As shown in Figure 3, some negative samples exhibit an excessive discrepancy in fidelity compared to the reference images and show little to no relevance to the text prompt. Such "positive-negative" samples may not provide effective learning signals for the model. This issue warrants further investigation. - The proposed SFO training fra
- **Comparative fine-tuning formulation:** The paper formalizes a pairwise optimization framework (SFO) that introduces explicit comparison between positive and negative targets, aligning with recent trends in preference-based optimization for diffusion models. - **Automatic negative construction:** The proposed CDNS procedure provides a simple yet systematic way to synthesize negative samples by degrading visual and textual conditions, avoiding the need for manual annotation. - **Comprehensiv
**Lack of Novelty:** The proposed SFO framework lacks clear novelty, as applying DPO-style preference optimization to diffusion models has already been explored in prior works such as Wallace (2024) and related studies. The current formulation does not introduce fundamentally new theoretical insights or algorithmic mechanisms beyond these existing diffusion-based DPO approaches. The paper should clarify how SFO differs from and improves upon these earlier methods. **Limited Performance:** Altho
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Generative Adversarial Networks and Image Synthesis · Multimodal Machine Learning Applications
