Refinement via Regeneration: Enlarging Modification Space Boosts Image Refinement in Unified Multimodal Models

Jiayi Guo; Linqing Wang; Jiangshan Wang; Yang Yue; Zeyu Liu; Zhiyuan Zhao; Qinglin Lu; Gao Huang; Chunyu Wang

arXiv:2604.25636·cs.CV·April 29, 2026

Refinement via Regeneration: Enlarging Modification Space Boosts Image Refinement in Unified Multimodal Models

Jiayi Guo, Linqing Wang, Jiangshan Wang, Yang Yue, Zeyu Liu, Zhiyuan Zhao, Qinglin Lu, Gao Huang, Chunyu Wang

PDF

1 Repo 1 Models 1 Datasets

TL;DR

The paper introduces Refinement via Regeneration (RvR), a new method for image refinement in multimodal models that enhances modification flexibility by regenerating images conditioned on prompts and semantic tokens.

Contribution

It proposes a shift from editing-based refinement to regeneration-based refinement, enlarging the modification space and improving semantic alignment in image refinement tasks.

Findings

01

RvR improves Geneval score from 0.78 to 0.91

02

RvR increases DPGBench score from 84.02 to 87.21

03

RvR boosts UniGenBench++ score from 61.53 to 77.41

Abstract

Unified multimodal models (UMMs) integrate visual understanding and generation within a single framework. For text-to-image (T2I) tasks, this unified capability allows UMMs to refine outputs after their initial generation, potentially extending the performance upper bound. Current UMM-based refinement methods primarily follow a refinement-via-editing (RvE) paradigm, where UMMs produce editing instructions to modify misaligned regions while preserving aligned content. However, editing instructions often describe prompt-image misalignment only coarsely, leading to incomplete refinement. Moreover, pixel-level preservation, though necessary for editing, unnecessarily restricts the effective modification space for refinement. To address these limitations, we propose Refinement via Regeneration (RvR), a novel framework that reformulates refinement as conditional image regeneration rather than…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

leaplabthu/RvR
github

Models

🤗
JiayiGuo821/RvR-7B-MoT
model· 61 dl· ♡ 3
61 dl♡ 3

Datasets

JiayiGuo821/RvR-Data
dataset· 88 dl
88 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.