FlowFixer: Towards Detail-Preserving Subject-Driven Generation
Jinyoung Jun, Won-Dong Jang, Wenbin Ouyang, Raghudeep Gadde, Jungbeom Lee

TL;DR
FlowFixer is a novel framework that enhances subject-driven image generation by restoring fine details lost during transformation, using image-to-image translation and a new fidelity metric.
Contribution
It introduces a direct image-to-image translation approach with a self-supervised training scheme and a keypoint-based fidelity metric for improved detail preservation.
Findings
Outperforms existing SDG methods in quality and fidelity.
Introduces a new self-supervised training data generation scheme.
Proposes a keypoint matching metric for detailed fidelity assessment.
Abstract
We present FlowFixer, a refinement framework for subject-driven generation (SDG) that restores fine details lost during generation caused by changes in scale and perspective of a subject. FlowFixer proposes direct image-to-image translation from visual references, avoiding ambiguities in language prompts. To enable image-to-image training, we introduce a one-step denoising scheme to generate self-supervised training data, which automatically removes high-frequency details while preserving global structure, effectively simulating real-world SDG errors. We further propose a keypoint matching-based metric to properly assess fidelity in details beyond semantic similarities usually measured by CLIP or DINO. Experimental results demonstrate that FlowFixer outperforms state-of-the-art SDG methods in both qualitative and quantitative evaluations, setting a new benchmark for high-fidelity…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Multimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning
