Refinement via Regeneration: Enlarging Modification Space Boosts Image Refinement in Unified Multimodal Models
Jiayi Guo, Linqing Wang, Jiangshan Wang, Yang Yue, Zeyu Liu, Zhiyuan Zhao, Qinglin Lu, Gao Huang, Chunyu Wang

TL;DR
The paper introduces Refinement via Regeneration (RvR), a new method for image refinement in multimodal models that enhances modification flexibility by regenerating images conditioned on prompts and semantic tokens.
Contribution
It proposes a shift from editing-based refinement to regeneration-based refinement, enlarging the modification space and improving semantic alignment in image refinement tasks.
Findings
RvR improves Geneval score from 0.78 to 0.91
RvR increases DPGBench score from 84.02 to 87.21
RvR boosts UniGenBench++ score from 61.53 to 77.41
Abstract
Unified multimodal models (UMMs) integrate visual understanding and generation within a single framework. For text-to-image (T2I) tasks, this unified capability allows UMMs to refine outputs after their initial generation, potentially extending the performance upper bound. Current UMM-based refinement methods primarily follow a refinement-via-editing (RvE) paradigm, where UMMs produce editing instructions to modify misaligned regions while preserving aligned content. However, editing instructions often describe prompt-image misalignment only coarsely, leading to incomplete refinement. Moreover, pixel-level preservation, though necessary for editing, unnecessarily restricts the effective modification space for refinement. To address these limitations, we propose Refinement via Regeneration (RvR), a novel framework that reformulates refinement as conditional image regeneration rather than…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
