TexLiDAR: Automated Text Understanding for Panoramic LiDAR Data
Naor Cohen, Roy Orfaig, Ben-Zion Bobrovsky

TL;DR
This paper introduces TexLiDAR, a novel method that leverages panoramic 2D images from advanced LiDAR sensors and large pre-trained models to improve text understanding and object detection in LiDAR data, bypassing the limitations of 3D point cloud processing.
Contribution
The paper proposes using 2D panoramic images from LiDAR sensors with large models like Florence 2 for zero-shot captioning and detection, offering a new approach to LiDAR-text integration.
Findings
Florence 2 produces more informative captions than existing methods.
The approach achieves superior object detection performance.
It enables real-time, high-accuracy detection in challenging scenarios.
Abstract
Efforts to connect LiDAR data with text, such as LidarCLIP, have primarily focused on embedding 3D point clouds into CLIP text-image space. However, these approaches rely on 3D point clouds, which present challenges in encoding efficiency and neural network processing. With the advent of advanced LiDAR sensors like Ouster OS1, which, in addition to 3D point clouds, produce fixed resolution depth, signal, and ambient panoramic 2D images, new opportunities emerge for LiDAR based tasks. In this work, we propose an alternative approach to connect LiDAR data with text by leveraging 2D imagery generated by the OS1 sensor instead of 3D point clouds. Using the Florence 2 large model in a zero-shot setting, we perform image captioning and object detection. Our experiments demonstrate that Florence 2 generates more informative captions and achieves superior performance in object detection tasks…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHandwritten Text Recognition Techniques · Image Processing and 3D Reconstruction · Advanced Image and Video Retrieval Techniques
MethodsFlorence · Contrastive Language-Image Pre-training
