OpenSight: A Simple Open-Vocabulary Framework for LiDAR-Based Object Detection
Hu Zhang, Jianhua Xu, Tao Tang, Haiyang Sun, Xin Yu, Zi Huang,, Kaicheng Yu

TL;DR
OpenSight introduces a novel 2D-3D modeling framework utilizing geometric priors and cross-modal fusion to enhance open-vocabulary LiDAR-based object detection, addressing overfitting issues in existing methods.
Contribution
The paper presents a new framework that combines 2D-3D geometric priors, temporal and spatial constraints, and cross-modal fusion for improved open-vocabulary detection in LiDAR data.
Findings
Achieves state-of-the-art open-vocabulary detection performance.
Effectively detects objects in new categories.
Utilizes geometric priors and cross-modal fusion for robust perception.
Abstract
Traditional LiDAR-based object detection research primarily focuses on closed-set scenarios, which falls short in complex real-world applications. Directly transferring existing 2D open-vocabulary models with some known LiDAR classes for open-vocabulary ability, however, tends to suffer from over-fitting problems: The obtained model will detect the known objects, even presented with a novel category. In this paper, we propose OpenSight, a more advanced 2D-3D modeling framework for LiDAR-based open-vocabulary detection. OpenSight utilizes 2D-3D geometric priors for the initial discernment and localization of generic objects, followed by a more specific semantic interpretation of the detected objects. The process begins by generating 2D boxes for generic objects from the accompanying camera images of LiDAR. These 2D boxes, together with LiDAR points, are then lifted back into the LiDAR…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Advanced Image and Video Retrieval Techniques · Domain Adaptation and Few-Shot Learning
MethodsALIGN
