TL;DR
LiMA is a novel framework that enhances LiDAR representation learning by capturing long-term spatiotemporal cues through cross-view and cross-sequence memory aggregation, improving perception tasks in autonomous driving.
Contribution
The paper introduces LiMA, a framework that explicitly models long-range temporal correlations and multi-view fusion for better LiDAR representations, with no extra computational cost.
Findings
Significant improvements in LiDAR semantic segmentation.
Enhanced 3D object detection accuracy.
Effective cross-view and cross-sequence memory alignment.
Abstract
LiDAR representation learning aims to extract rich structural and semantic information from large-scale, readily available datasets, reducing reliance on costly human annotations. However, existing LiDAR representation strategies often overlook the inherent spatiotemporal cues in LiDAR sequences, limiting their effectiveness. In this work, we propose LiMA, a novel long-term image-to-LiDAR Memory Aggregation framework that explicitly captures longer range temporal correlations to enhance LiDAR representation learning. LiMA comprises three key components: 1) a Cross-View Aggregation module that aligns and fuses overlapping regions across neighboring camera views, constructing a more unified and redundancy-free memory bank; 2) a Long-Term Feature Propagation mechanism that efficiently aligns and integrates multi-frame image features, reinforcing temporal coherence during LiDAR…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
