Graph Neural Network and Spatiotemporal Transformer Attention for 3D   Video Object Detection from Point Clouds

Junbo Yin; Jianbing Shen; Xin Gao; David Crandall; Ruigang Yang

arXiv:2207.12659·cs.CV·November 29, 2022

Graph Neural Network and Spatiotemporal Transformer Attention for 3D Video Object Detection from Point Clouds

Junbo Yin, Jianbing Shen, Xin Gao, David Crandall, Ruigang Yang

PDF

TL;DR

This paper introduces a novel approach for 3D video object detection from point clouds by leveraging short-term and long-term temporal information through graph neural networks and spatiotemporal transformers, achieving state-of-the-art results.

Contribution

It proposes GMPNet for short-term motion encoding and AST-GRU for long-term frame aggregation, enhancing detection accuracy in point cloud videos.

Findings

01

Achieved 1st place on nuScenes leaderboard.

02

Outperformed existing methods in short-term motion modeling.

03

Effectively integrated spatial and temporal attention modules.

Abstract

Previous works for LiDAR-based 3D object detection mainly focus on the single-frame paradigm. In this paper, we propose to detect 3D objects by exploiting temporal information in multiple frames, i.e., the point cloud videos. We empirically categorize the temporal information into short-term and long-term patterns. To encode the short-term data, we present a Grid Message Passing Network (GMPNet), which considers each grid (i.e., the grouped points) as a node and constructs a k-NN graph with the neighbor grids. To update features for a grid, GMPNet iteratively collects information from its neighbors, thus mining the motion cues in grids from nearby frames. To further aggregate the long-term frames, we propose an Attentive Spatiotemporal Transformer GRU (AST-GRU), which contains a Spatial Transformer Attention (STA) module and a Temporal Transformer Attention (TTA) module. STA and TTA…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Softmax · Dense Connections · Dropout · Adam · Byte Pair Encoding · Spatial Transformer · Label Smoothing