Inline Critic Steers Image Editing
Weitai Kang, Xiaohang Zhan, Yizhou Wang, Mang Tik Chiu, Jason Kuen, Kangning Liu, Yan Yan

TL;DR
This paper introduces Inline Critic, a learnable token that critiques and refines image editing predictions during the forward pass, leading to state-of-the-art results on multiple benchmarks.
Contribution
It proposes a novel inline critiquing mechanism that operates within the model's forward pass, improving image editing quality without additional inference steps.
Findings
Achieved state-of-the-art on GEdit-Bench with 7.89 score.
Improved RISEBench performance by +9.4 over the backbone.
Surpassed GPT-4o on KRIS-Bench with 81.92 score.
Abstract
Instruction-based image editing exhibits heterogeneous difficulty not only across cases but also across regions of an image, motivating refinement approaches that allocate correction to where the model struggles. Existing refinement signals arrive late, after a fully generated image or a completed denoising step. We ask whether such a signal can act within an ongoing forward pass. To investigate this, we probe a frozen image-editing model and find that although generation capability emerges only in the last few layers, the error pattern is already set in early layers (rank correlation \r{ho} = 0.83 with the final-layer error map). Based on this, we introduce Inline Critic, a learnable token that critiques a frozen model's predictions at its intermediate layers and steers its hidden states to refine generation during the forward pass. A three-stage recipe is proposed to stabilize the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
