PhysEdit: Physically-Consistent Region-Aware Image Editing via Adaptive Spatio-Temporal Reasoning

Guandong Li; Mengxia Ye

arXiv:2605.00707·cs.CV·May 4, 2026

PhysEdit: Physically-Consistent Region-Aware Image Editing via Adaptive Spatio-Temporal Reasoning

Guandong Li, Mengxia Ye

PDF

TL;DR

PhysEdit introduces adaptive reasoning modules that customize spatial and temporal inference for diverse image editing tasks, improving efficiency and accuracy without retraining.

Contribution

It presents a novel adaptive inference framework with CARD and SRM modules that enhance reasoning efficiency and accuracy in image editing.

Findings

01

PhysEdit achieves a 1.18x speedup over baseline methods.

02

It slightly improves instruction adherence by 0.7%.

03

Speedup reaches 1.52x on appearance-level edits.

Abstract

Image editing instructions are heterogeneous: a color swap, an object insertion, and a physical-action edit all demand different spatial coverage and different reasoning depth, yet existing reasoning-based editors apply a single fixed inference recipe to every instruction. We argue that adaptivity along both the spatial and temporal axes is the missing degree of freedom, and we present PhysEdit, an editing framework built around this principle. PhysEdit introduces two inference-time modules that compose without retraining the backbone. At its core, (1) Complexity-Adaptive Reasoning Depth (CARD) predicts edit complexity directly from the instruction and reference image and allocates the reasoning step count N_r and reasoning-token length r per sample -- turning a previously fixed inference schedule into a conditional-computation problem. CARD is supported by (2) a Spatial Reasoning Mask…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.