PhysLayer: Language-Guided Layered Animation with Depth-Aware Physics
Tianyidan Xie, Zhentao Huang, Mingjie Wang, Xin Huang, Jun Zhou, Minglun Gong, Zili Yi

TL;DR
PhysLayer is a novel framework that enables language-guided, depth-aware layered animation of static images, improving physical plausibility and control over object dynamics without full 3D reconstruction.
Contribution
It introduces a three-component system combining scene understanding, depth-aware physics simulation, and video synthesis for realistic, controllable image animation.
Findings
CLIP-Similarity increased by 2.2%
FID score improved by 9.3%
Human evaluation showed 24% better physical plausibility
Abstract
Existing image-to-video generation methods often produce physically implausible motions and lack precise control over object dynamics. While prior approaches have incorporated physics simulators, they remain confined to 2D planar motions and fail to capture depth-aware spatial interactions. We introduce PhysLayer, a novel framework enabling language-guided, depth-aware layered animation of static images. PhysLayer consists of three key components: First, a language-guided scene understanding module that utilizes vision foundation models to decompose scenes into depth-based layers by analyzing object composition, material properties, and physical parameters. Second, a depth-aware layered physics simulation that extends 2D rigid-body dynamics with depth motion and perspective-consistent scaling, enabling more realistic object interactions without requiring full 3D reconstruction. Third, a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
