Action-guided generation of 3D functionality segmentation data

Jaime Corsetti; Francesco Giuliari; Davide Boscaini; Pedro Hermosilla; Andrea Pilzer; Guofeng Mei; Alexandros Delitzas; Francis Engelmann; Fabio Poiesi

arXiv:2511.23230·cs.CV·April 7, 2026

Action-guided generation of 3D functionality segmentation data

Jaime Corsetti, Francesco Giuliari, Davide Boscaini, Pedro Hermosilla, Andrea Pilzer, Guofeng Mei, Alexandros Delitzas, Francis Engelmann, Fabio Poiesi

PDF

1 Repo

TL;DR

SynthFun3D is a novel method that generates synthetic 3D functionality segmentation data from action descriptions, improving 3D scene understanding models without manual annotation.

Contribution

It introduces a scalable approach to create annotated 3D data from language instructions, enhancing 3D functionality segmentation performance.

Findings

01

Synthetic data improves segmentation metrics by +2.2 mAP, +6.3 mAR, +5.7 mIoU.

02

SynthFun3D constructs plausible 3D scenes from descriptions using object repositories.

03

Augmenting real data with synthetic data consistently boosts model accuracy.

Abstract

3D functionality segmentation aims to identify the interactive element in a 3D scene required to perform an action described in free-form language (e.g., the handle to ``Open the second drawer of the cabinet near the bed''). Progress has been constrained by the scarcity of annotated real-world data, as collecting and labeling fine-grained 3D masks is prohibitively expensive. To address this limitation, we introduce SynthFun3D, the first method for generating 3D functionality segmentation data directly from action descriptions. Given an action description, SynthFun3D constructs a plausible 3D scene by retrieving objects with part-level annotations from a large-scale asset repository and arranging them under spatial and semantic constraints. SynthFun3D renders multi-view images and automatically identifies the target functional element, producing precise ground-truth masks without manual…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

null
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.