Prompt-Guided Image Editing with Masked Logit Nudging in Visual Autoregressive Models

Amir El-Ghoussani; Marc H\"olle; Gustavo Carneiro; Vasileios Belagiannis

arXiv:2604.14591·cs.CV·April 17, 2026

Prompt-Guided Image Editing with Masked Logit Nudging in Visual Autoregressive Models

Amir El-Ghoussani, Marc H\"olle, Gustavo Carneiro, Vasileios Belagiannis

PDF

1 Repo

TL;DR

This paper introduces Masked Logit Nudging, a novel prompt-guided image editing method for visual autoregressive models that improves editing accuracy, reconstruction quality, and speed compared to previous approaches.

Contribution

It proposes Masked Logit Nudging, a guidance technique that aligns model predictions with source image tokens for precise, efficient image editing and reconstruction.

Findings

01

Achieves state-of-the-art performance on the PIE benchmark.

02

Outperforms previous VAR-based methods and rivals diffusion models in quality.

03

Provides faster image editing and reconstruction than existing methods.

Abstract

We address the problem of prompt-guided image editing in visual autoregressive models. Given a source image and a target text prompt, we aim to modify the source image according to the target prompt, while preserving all regions which are unrelated to the requested edit. To this end, we present Masked Logit Nudging, which uses the source image token maps to introduce a guidance step that aligns the model's predictions under the target prompt with these source token maps. Specifically, we convert the fixed source encodings into logits using the VAR encoding, nudging the model's predicted logits towards the targets along a semantic trajectory defined by the source-target prompts. Edits are applied only within spatial masks obtained through a dedicated masking scheme that leverages cross-attention differences between the source and edited prompts. Then, we introduce a refinement to correct…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

AmirMaEl/MLN
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.