From Statics to Dynamics: Physics-Aware Image Editing with Latent Transition Priors
Liangbing Zhao, Le Zhuo, Sayak Paul, Hongsheng Li, Mohamed Elhoseiny

TL;DR
This paper introduces PhysicEdit, a physics-aware image editing framework that models physical state transitions, leveraging a large video dataset and a dual-thinking reasoning mechanism to produce more physically plausible edits.
Contribution
It formulates image editing as predictive physical state transitions and introduces PhysicTran38K dataset and PhysicEdit framework with a dual-thinking mechanism for improved realism.
Findings
PhysicEdit outperforms previous models in physical realism by 5.9%.
PhysicEdit improves knowledge-grounded editing accuracy by 10.1%.
Achieves state-of-the-art results among open-source methods.
Abstract
Instruction-based image editing has achieved remarkable success in semantic alignment, yet state-of-the-art models frequently fail to render physically plausible results when editing involves complex causal dynamics, such as refraction or material deformation. We attribute this limitation to the dominant paradigm that treats editing as a discrete mapping between image pairs, which provides only boundary conditions and leaves transition dynamics underspecified. To address this, we reformulate physics-aware editing as predictive physical state transitions and introduce PhysicTran38K, a large-scale video-based dataset comprising 38K transition trajectories across five physical domains, constructed via a two-stage filtering and constraint-aware annotation pipeline. Building on this supervision, we propose PhysicEdit, an end-to-end framework equipped with a textual-visual dual-thinking…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Generative Adversarial Networks and Image Synthesis · Advanced Image and Video Retrieval Techniques
