BIP3D: Bridging 2D Images and 3D Perception for Embodied Intelligence

Xuewu Lin; Tianwei Lin; Lichao Huang; Hongyu Xie; Zhizhong Su

arXiv:2411.14869·cs.CV·December 2, 2024

BIP3D: Bridging 2D Images and 3D Perception for Embodied Intelligence

Xuewu Lin, Tianwei Lin, Lichao Huang, Hongyu Xie, Zhizhong Su

PDF

Open Access 1 Repo 1 Models

TL;DR

BIP3D is a novel image-centric 3D perception model that leverages pre-trained vision models and explicit 3D encoding to improve embodied agents' understanding of their environment, outperforming point cloud methods.

Contribution

The paper introduces BIP3D, a new approach combining image features and 3D encoding, enhancing perception performance beyond traditional point cloud-based methods.

Findings

01

Outperforms state-of-the-art on EmbodiedScan benchmark

02

Achieves 5.69% improvement in 3D detection

03

Achieves 15.25% improvement in 3D visual grounding

Abstract

In embodied intelligence systems, a key component is 3D perception algorithm, which enables agents to understand their surrounding environments. Previous algorithms primarily rely on point cloud, which, despite offering precise geometric information, still constrain perception performance due to inherent sparsity, noise, and data scarcity. In this work, we introduce a novel image-centric 3D perception model, BIP3D, which leverages expressive image features with explicit 3D position encoding to overcome the limitations of point-centric methods. Specifically, we leverage pre-trained 2D vision foundation models to enhance semantic understanding, and introduce a spatial enhancer module to improve spatial understanding. Together, these modules enable BIP3D to achieve multi-view, multi-modal feature fusion and end-to-end 3D perception. In our experiments, BIP3D outperforms current…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

HorizonRobotics/BIP3D
pytorchOfficial

Models

🤗
HorizonRobotics/BIP3D
model· ♡ 4
♡ 4

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRobotics and Automated Systems · Human Pose and Action Recognition · 3D Surveying and Cultural Heritage