SirenPose: Dynamic Scene Reconstruction via Geometric Supervision
Kaitong Cai, Jensen Zhang, Jing Yang, Keze Wang

TL;DR
SirenPose is a novel method for dynamic scene reconstruction from monocular videos that combines geometric supervision, physics-inspired constraints, and high-frequency signal modeling to achieve accurate, consistent, and detailed 3D reconstructions.
Contribution
It introduces a geometry-aware loss with sinusoidal networks, expands the UniKPT dataset, and employs graph neural networks for improved dynamic scene and pose estimation.
Findings
Outperforms state-of-the-art on Sintel, Bonn, and DAVIS benchmarks.
Reduces FVD by 17.8%, FID by 28.7%, and improves LPIPS by 6%.
Enhances temporal consistency and geometric accuracy in dynamic scenes.
Abstract
We introduce SirenPose, a geometry-aware loss formulation that integrates the periodic activation properties of sinusoidal representation networks with keypoint-based geometric supervision, enabling accurate and temporally consistent reconstruction of dynamic 3D scenes from monocular videos. Existing approaches often struggle with motion fidelity and spatiotemporal coherence in challenging settings involving fast motion, multi-object interaction, occlusion, and rapid scene changes. SirenPose incorporates physics inspired constraints to enforce coherent keypoint predictions across both spatial and temporal dimensions, while leveraging high frequency signal modeling to capture fine grained geometric details. We further expand the UniKPT dataset to 600,000 annotated instances and integrate graph neural networks to model keypoint relationships and structural correlations. Extensive…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Vision and Imaging · 3D Shape Modeling and Analysis · Human Pose and Action Recognition
