TL;DR
EditInfinity introduces a binary-quantized generative model for image editing that achieves high fidelity and semantic accuracy by leveraging exact intermediate representations, surpassing diffusion-based methods.
Contribution
The paper presents a novel binary-quantized generative model, EditInfinity, with an efficient inversion mechanism and smoothing strategy for precise, high-fidelity image editing guided by text prompts.
Findings
Outperforms diffusion-based baselines on PIE-Bench
Enables precise image inversion with text prompt rectification
Maintains high fidelity and semantic alignment in edits
Abstract
Adapting pretrained diffusion-based generative models for text-driven image editing with negligible tuning overhead has demonstrated remarkable potential. A classical adaptation paradigm, as followed by these methods, first infers the generative trajectory inversely for a given source image by image inversion, then performs image editing along the inferred trajectory guided by the target text prompts. However, the performance of image editing is heavily limited by the approximation errors introduced during image inversion by diffusion models, which arise from the absence of exact supervision in the intermediate generative steps. To circumvent this issue, we investigate the parameter-efficient adaptation of binary-quantized generative models for image editing, and leverage their inherent characteristic that the exact intermediate quantized representations of a source image are…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
