Functionality understanding and segmentation in 3D scenes
Jaime Corsetti, Francesco Giuliari, Alice Fasoli, Davide Boscaini, Fabio Poiesi

TL;DR
This paper introduces Fun3DU, a training-free method that uses pre-trained language and vision models to understand and segment functional objects in 3D scenes based on natural language tasks.
Contribution
We present Fun3DU, the first dedicated approach for functionality understanding in 3D scenes, leveraging Chain-of-Thought reasoning and multi-view segmentation without training.
Findings
Outperforms state-of-the-art open-vocabulary 3D segmentation methods
Successfully segments objects based on natural language task descriptions
Evaluated on SceneFun3D dataset with over 3000 task descriptions
Abstract
Understanding functionalities in 3D scenes involves interpreting natural language descriptions to locate functional interactive objects, such as handles and buttons, in a 3D environment. Functionality understanding is highly challenging, as it requires both world knowledge to interpret language and spatial perception to identify fine-grained objects. For example, given a task like 'turn on the ceiling light', an embodied AI agent must infer that it needs to locate the light switch, even though the switch is not explicitly mentioned in the task description. To date, no dedicated methods have been developed for this problem. In this paper, we introduce Fun3DU, the first approach designed for functionality understanding in 3D scenes. Fun3DU uses a language model to parse the task description through Chain-of-Thought reasoning in order to identify the object of interest. The identified…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsIndustrial Vision Systems and Defect Detection · Image Processing and 3D Reconstruction
