Spatiotemporal Transformer Attention Network for 3D Voxel Level Joint Segmentation and Motion Prediction in Point Cloud
Zhensong Wei, Xuewei Qi, Zhengwei Bai, Guoyuan Wu, Saswat Nayak, Peng, Hao, Matthew Barth, Yongkang Liu, and Kentaro Oguchi

TL;DR
This paper introduces a novel spatiotemporal transformer attention network that jointly performs 3D voxel-level segmentation and motion prediction from point cloud sequences, enhancing environment perception for autonomous driving.
Contribution
It presents a new transformer-based backbone with temporal and spatial attention modules for simultaneous perception tasks directly from point cloud data.
Findings
Achieved promising performance on the nuScenes dataset.
Effectively combines segmentation and motion prediction in a single model.
Learns complex spatiotemporal features directly from point cloud sequences.
Abstract
Environment perception including detection, classification, tracking, and motion prediction are key enablers for automated driving systems and intelligent transportation applications. Fueled by the advances in sensing technologies and machine learning techniques, LiDAR-based sensing systems have become a promising solution. The current challenges of this solution are how to effectively combine different perception tasks into a single backbone and how to efficiently learn the spatiotemporal features directly from point cloud sequences. In this research, we propose a novel spatiotemporal attention network based on a transformer self-attention mechanism for joint semantic segmentation and motion prediction within a point cloud at the voxel level. The network is trained to simultaneously outputs the voxel level class and predicted motion by learning directly from a sequence of point cloud…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRemote Sensing and LiDAR Applications · Advanced Neural Network Applications · 3D Surveying and Cultural Heritage
MethodsMax Pooling · Convolution · Average Pooling · Sigmoid Activation
