PanopticQuery: Unified Query-Time Reasoning for 4D Scenes
Ruilin Tang, Yang Zhou, Zhong Ye, Wenxi Liu, Yan Huang, and Shengfeng He

TL;DR
PanopticQuery is a unified framework that enhances 4D scene understanding by integrating high-fidelity reconstruction with semantic reasoning for complex language queries.
Contribution
It introduces a novel query-time reasoning method combining 4D Gaussian Splatting and multi-view semantic consensus for dynamic scene analysis.
Findings
Sets new state-of-the-art on complex language queries in 4D scenes.
Effectively handles attributes, actions, spatial relations, and interactions.
Provides a new benchmark, Panoptic-L4D, for language-based dynamic scene querying.
Abstract
Understanding dynamic 4D environments through natural language queries requires not only accurate scene reconstruction but also robust semantic grounding across space, time, and viewpoints. While recent methods using neural representations have advanced 4D reconstruction, they remain limited in contextual reasoning, especially for complex semantics such as interactions, temporal actions, and spatial relations. A key challenge lies in transforming noisy, view-dependent predictions into globally consistent 4D interpretations. We introduce PanopticQuery, a framework for unified query-time reasoning in 4D scenes. Our approach builds on 4D Gaussian Splatting for high-fidelity dynamic reconstruction and introduces a multi-view semantic consensus mechanism that grounds natural language queries by aggregating 2D semantic predictions across multiple views and time frames. This process filters…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
