Graph Neural Network and Spatiotemporal Transformer Attention for 3D Video Object Detection from Point Clouds
Junbo Yin, Jianbing Shen, Xin Gao, David Crandall, Ruigang Yang

TL;DR
This paper introduces a novel approach for 3D video object detection from point clouds by leveraging short-term and long-term temporal information through graph neural networks and spatiotemporal transformers, achieving state-of-the-art results.
Contribution
It proposes GMPNet for short-term motion encoding and AST-GRU for long-term frame aggregation, enhancing detection accuracy in point cloud videos.
Findings
Achieved 1st place on nuScenes leaderboard.
Outperformed existing methods in short-term motion modeling.
Effectively integrated spatial and temporal attention modules.
Abstract
Previous works for LiDAR-based 3D object detection mainly focus on the single-frame paradigm. In this paper, we propose to detect 3D objects by exploiting temporal information in multiple frames, i.e., the point cloud videos. We empirically categorize the temporal information into short-term and long-term patterns. To encode the short-term data, we present a Grid Message Passing Network (GMPNet), which considers each grid (i.e., the grouped points) as a node and constructs a k-NN graph with the neighbor grids. To update features for a grid, GMPNet iteratively collects information from its neighbors, thus mining the motion cues in grids from nearby frames. To further aggregate the long-term frames, we propose an Attentive Spatiotemporal Transformer GRU (AST-GRU), which contains a Spatial Transformer Attention (STA) module and a Temporal Transformer Attention (TTA) module. STA and TTA…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Softmax · Dense Connections · Dropout · Adam · Byte Pair Encoding · Spatial Transformer · Label Smoothing
