UniFunc3D: Unified Active Spatial-Temporal Grounding for 3D Functionality Segmentation

Jiaying Lin; Dan Xu

arXiv:2603.23478·cs.CV·March 25, 2026

UniFunc3D: Unified Active Spatial-Temporal Grounding for 3D Functionality Segmentation

Jiaying Lin, Dan Xu

PDF

Open Access

TL;DR

UniFunc3D introduces a unified, training-free framework that leverages large language models for active, spatial-temporal reasoning to improve 3D functionality segmentation in complex scenes.

Contribution

It presents a novel active spatial-temporal grounding approach that integrates semantic, temporal, and spatial reasoning in a single pass without task-specific training.

Findings

01

Achieves 59.9% mIoU improvement on SceneFun3D

02

Surpasses existing training-free and training-based methods

03

Demonstrates effective coarse-to-fine reasoning strategy

Abstract

Functionality segmentation in 3D scenes requires an agent to ground implicit natural-language instructions into precise masks of fine-grained interactive elements. Existing methods rely on fragmented pipelines that suffer from visual blindness during initial task parsing. We observe that these methods are limited by single-scale, passive and heuristic frame selection. We present UniFunc3D, a unified and training-free framework that treats the multimodal large language model as an active observer. By consolidating semantic, temporal, and spatial reasoning into a single forward pass, UniFunc3D performs joint reasoning to ground task decomposition in direct visual evidence. Our approach introduces active spatial-temporal grounding with a coarse-to-fine strategy. This allows the model to select correct video frames adaptively and focus on high-detail interactive parts while preserving the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Generative Adversarial Networks and Image Synthesis · 3D Shape Modeling and Analysis