Explicit3D: Graph Network with Spatial Inference for Single Image 3D Object Detection
Yanjun Liu, Wenming Yang

TL;DR
Explicit3D introduces a graph-based approach with spatial inference for single image 3D object detection, leveraging relational geometry and semantics to improve accuracy and efficiency in indoor scene understanding.
Contribution
The paper presents a novel dynamic sparse graph pipeline with a pruning algorithm and new loss functions that explicitly model spatial relationships between objects.
Findings
Outperforms state-of-the-art on SUN RGB-D dataset
Balances detection accuracy and computational efficiency
Effectively models geometric consistency between objects
Abstract
Indoor 3D object detection is an essential task in single image scene understanding, impacting spatial cognition fundamentally in visual reasoning. Existing works on 3D object detection from a single image either pursue this goal through independent predictions of each object or implicitly reason over all possible objects, failing to harness relational geometric information between objects. To address this problem, we propose a dynamic sparse graph pipeline named Explicit3D based on object geometry and semantics features. Taking the efficiency into consideration, we further define a relatedness score and design a novel dynamic pruning algorithm followed by a cluster sampling method for sparse scene graph generation and updating. Furthermore, our Explicit3D introduces homogeneous matrices and defines new relative loss and corner loss to model the spatial difference between target pairs…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVisual Attention and Saliency Detection · Advanced Image and Video Retrieval Techniques · Advanced Neural Network Applications
MethodsPruning
