TL;DR
FunFact is a probabilistic framework that constructs functional 3D scene graphs from RGB-D images, leveraging joint inference and priors to improve understanding and confidence calibration.
Contribution
It introduces a novel probabilistic approach for functional scene graph construction that considers scene-wide interdependence and uses foundation models and priors for better inference.
Findings
Improves node and relation discovery recall.
Reduces calibration error for ambiguous relations.
Outperforms existing methods on multiple datasets.
Abstract
Recent work in 3D scene understanding is moving beyond purely spatial analysis toward functional scene understanding. However, existing methods often consider functional relationships between object pairs in isolation, failing to capture the scene-wide interdependence that humans use to resolve ambiguity. We introduce FunFact, a framework for constructing probabilistic open-vocabulary functional 3D scene graphs from posed RGB-D images. FunFact first builds an object- and part-centric 3D map and uses foundation models to propose semantically plausible functional relations. These candidates are converted into factor graph variables and constrained by both LLM-derived common-sense priors and geometric priors. This formulation enables joint probabilistic inference over all functional edges and their marginals, yielding substantially better calibrated confidence scores. To benchmark this…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
