PhysLayer: Language-Guided Layered Animation with Depth-Aware Physics

Tianyidan Xie; Zhentao Huang; Mingjie Wang; Xin Huang; Jun Zhou; Minglun Gong; Zili Yi

arXiv:2604.23574·cs.CV·April 28, 2026

PhysLayer: Language-Guided Layered Animation with Depth-Aware Physics

Tianyidan Xie, Zhentao Huang, Mingjie Wang, Xin Huang, Jun Zhou, Minglun Gong, Zili Yi

PDF

TL;DR

PhysLayer is a novel framework that enables language-guided, depth-aware layered animation of static images, improving physical plausibility and control over object dynamics without full 3D reconstruction.

Contribution

It introduces a three-component system combining scene understanding, depth-aware physics simulation, and video synthesis for realistic, controllable image animation.

Findings

01

CLIP-Similarity increased by 2.2%

02

FID score improved by 9.3%

03

Human evaluation showed 24% better physical plausibility

Abstract

Existing image-to-video generation methods often produce physically implausible motions and lack precise control over object dynamics. While prior approaches have incorporated physics simulators, they remain confined to 2D planar motions and fail to capture depth-aware spatial interactions. We introduce PhysLayer, a novel framework enabling language-guided, depth-aware layered animation of static images. PhysLayer consists of three key components: First, a language-guided scene understanding module that utilizes vision foundation models to decompose scenes into depth-based layers by analyzing object composition, material properties, and physical parameters. Second, a depth-aware layered physics simulation that extends 2D rigid-body dynamics with depth motion and perspective-consistent scaling, enabling more realistic object interactions without requiring full 3D reconstruction. Third, a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.