3D Scene Grammar for Parsing RGB-D Pointclouds
Abhishek Anand, Sherwin Li

TL;DR
This paper introduces a grammar-based approach for 3D scene understanding from RGB-D point clouds, capturing object structures and scene composition through probabilistic rules, enabling efficient parsing and labeling.
Contribution
The paper presents a novel 3D scene grammar model that captures object and scene structure, with efficient training and scalable parsing, and releases a new dataset and code.
Findings
Outperforms existing segment-labeling algorithms in accuracy.
Training is very fast, taking only seconds.
Scales linearly with the number of grammar rules.
Abstract
We pose 3D scene-understanding as a problem of parsing in a grammar. A grammar helps us capture the compositional structure of real-word objects, e.g., a chair is composed of a seat, a back-rest and some legs. Having multiple rules for an object helps us capture structural variations in objects, e.g., a chair can optionally also have arm-rests. Finally, having rules to capture composition at different levels helps us formulate the entire scene-processing pipeline as a single problem of finding most likely parse-tree---small segments combine to form parts of objects, parts to objects and objects to a scene. We attach a generative probability model to our grammar by having a feature-dependent probability function for every rule. We evaluated it by extracting labels for every segment and comparing the results with the state-of-the-art segment-labeling algorithm. Our algorithm was…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRobotics and Sensor-Based Localization · 3D Shape Modeling and Analysis · Advanced Vision and Imaging
