TL;DR
SynthFun3D is a novel method that generates synthetic 3D functionality segmentation data from action descriptions, improving 3D scene understanding models without manual annotation.
Contribution
It introduces a scalable approach to create annotated 3D data from language instructions, enhancing 3D functionality segmentation performance.
Findings
Synthetic data improves segmentation metrics by +2.2 mAP, +6.3 mAR, +5.7 mIoU.
SynthFun3D constructs plausible 3D scenes from descriptions using object repositories.
Augmenting real data with synthetic data consistently boosts model accuracy.
Abstract
3D functionality segmentation aims to identify the interactive element in a 3D scene required to perform an action described in free-form language (e.g., the handle to ``Open the second drawer of the cabinet near the bed''). Progress has been constrained by the scarcity of annotated real-world data, as collecting and labeling fine-grained 3D masks is prohibitively expensive. To address this limitation, we introduce SynthFun3D, the first method for generating 3D functionality segmentation data directly from action descriptions. Given an action description, SynthFun3D constructs a plausible 3D scene by retrieving objects with part-level annotations from a large-scale asset repository and arranging them under spatial and semantic constraints. SynthFun3D renders multi-view images and automatically identifies the target functional element, producing precise ground-truth masks without manual…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
