Sparse4D v2: Recurrent Temporal Fusion with Sparse Model

Xuewu Lin; Tianwei Lin; Zixiang Pei; Lichao Huang; Zhizhong Su

arXiv:2305.14018·cs.CV·May 25, 2023·20 cites

Sparse4D v2: Recurrent Temporal Fusion with Sparse Model

Xuewu Lin, Tianwei Lin, Zixiang Pei, Lichao Huang, Zhizhong Su

PDF

Open Access 1 Repo

TL;DR

Sparse4D v2 introduces a recursive, recurrent temporal fusion method for sparse perception that significantly reduces computational complexity and enhances long-term information integration, achieving state-of-the-art results in 3D detection.

Contribution

It presents an improved Sparse4D with a recursive temporal fusion module that reduces complexity and enables long-term feature integration for better perception performance.

Findings

01

Reduces temporal fusion complexity from O(T) to O(1).

02

Achieves state-of-the-art results on nuScenes 3D detection.

03

Improves inference speed and memory efficiency.

Abstract

Sparse algorithms offer great flexibility for multi-view temporal perception tasks. In this paper, we present an enhanced version of Sparse4D, in which we improve the temporal fusion module by implementing a recursive form of multi-frame feature sampling. By effectively decoupling image features and structured anchor features, Sparse4D enables a highly efficient transformation of temporal features, thereby facilitating temporal fusion solely through the frame-by-frame transmission of sparse features. The recurrent temporal fusion approach provides two main benefits. Firstly, it reduces the computational complexity of temporal fusion from $O (T)$ to $O (1)$ , resulting in significant improvements in inference speed and memory usage. Secondly, it enables the fusion of long-term information, leading to more pronounced performance improvements due to temporal fusion. Our proposed approach,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

linxuewu/sparse4d
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Image Fusion Techniques · Photoacoustic and Ultrasonic Imaging · Image Enhancement Techniques

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings