TexLiDAR: Automated Text Understanding for Panoramic LiDAR Data

Naor Cohen; Roy Orfaig; Ben-Zion Bobrovsky

arXiv:2502.04385·cs.CV·February 24, 2025

TexLiDAR: Automated Text Understanding for Panoramic LiDAR Data

Naor Cohen, Roy Orfaig, Ben-Zion Bobrovsky

PDF

Open Access 1 Repo

TL;DR

This paper introduces TexLiDAR, a novel method that leverages panoramic 2D images from advanced LiDAR sensors and large pre-trained models to improve text understanding and object detection in LiDAR data, bypassing the limitations of 3D point cloud processing.

Contribution

The paper proposes using 2D panoramic images from LiDAR sensors with large models like Florence 2 for zero-shot captioning and detection, offering a new approach to LiDAR-text integration.

Findings

01

Florence 2 produces more informative captions than existing methods.

02

The approach achieves superior object detection performance.

03

It enables real-time, high-accuracy detection in challenging scenarios.

Abstract

Efforts to connect LiDAR data with text, such as LidarCLIP, have primarily focused on embedding 3D point clouds into CLIP text-image space. However, these approaches rely on 3D point clouds, which present challenges in encoding efficiency and neural network processing. With the advent of advanced LiDAR sensors like Ouster OS1, which, in addition to 3D point clouds, produce fixed resolution depth, signal, and ambient panoramic 2D images, new opportunities emerge for LiDAR based tasks. In this work, we propose an alternative approach to connect LiDAR data with text by leveraging 2D imagery generated by the OS1 sensor instead of 3D point clouds. Using the Florence 2 large model in a zero-shot setting, we perform image captioning and object detection. Our experiments demonstrate that Florence 2 generates more informative captions and achieves superior performance in object detection tasks…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

AIROTAU/TexLiDAR
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHandwritten Text Recognition Techniques · Image Processing and 3D Reconstruction · Advanced Image and Video Retrieval Techniques

MethodsFlorence · Contrastive Language-Image Pre-training