Multi-modal Streaming 3D Object Detection
Mazen Abdelfattah, Kaiwen Yuan, Z. Jane Wang, and Rabab Ward

TL;DR
This paper introduces a novel multi-modal streaming 3D object detection framework that leverages camera images instead of past LiDAR slices, providing real-time, wide, and dense perception for autonomous vehicles, outperforming existing methods.
Contribution
It presents an innovative camera-LiDAR fusion approach for streaming perception, addressing limitations of previous single-modality models and enhancing real-time 3D detection accuracy.
Findings
Outperforms prior streaming models on NuScenes benchmark.
Surpasses full-scan detectors in speed and accuracy.
Robust to missing images, narrow slices, and calibration errors.
Abstract
Modern autonomous vehicles rely heavily on mechanical LiDARs for perception. Current perception methods generally require 360{\deg} point clouds, collected sequentially as the LiDAR scans the azimuth and acquires consecutive wedge-shaped slices. The acquisition latency of a full scan (~ 100ms) may lead to outdated perception which is detrimental to safe operation. Recent streaming perception works proposed directly processing LiDAR slices and compensating for the narrow field of view (FOV) of a slice by reusing features from preceding slices. These works, however, are all based on a single modality and require past information which may be outdated. Meanwhile, images from high-frequency cameras can support streaming models as they provide a larger FoV compared to a LiDAR slice. However, this difference in FoV complicates sensor fusion. To address this research gap, we propose an…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Advanced Optical Sensing Technologies · Robotics and Sensor-Based Localization
