A Point-Based Approach to Efficient LiDAR Multi-Task Perception
Christopher Lang, Alexander Braun, Lars Schillingmann, Abhinav Valada

TL;DR
PAttFormer is a novel point-based multi-task architecture for LiDAR perception that is smaller, faster, and maintains competitive accuracy for semantic segmentation and object detection.
Contribution
It introduces a transformer-based, point-only architecture that eliminates the need for multiple task-specific encoders, improving efficiency in LiDAR multi-task perception.
Findings
3x smaller network size compared to previous methods
1.4x faster inference speed
Improved segmentation and detection accuracy on nuScenes
Abstract
Multi-task networks can potentially improve performance and computational efficiency compared to single-task networks, facilitating online deployment. However, current multi-task architectures in point cloud perception combine multiple task-specific point cloud representations, each requiring a separate feature encoder and making the network structures bulky and slow. We propose PAttFormer, an efficient multi-task architecture for joint semantic segmentation and object detection in point clouds that only relies on a point-based representation. The network builds on transformer-based feature encoders using neighborhood attention and grid-pooling and a query-based detection decoder using a novel 3D deformable-attention detection head design. Unlike other LiDAR-based multi-task architectures, our proposed PAttFormer does not require separate feature encoders for multiple task-specific…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRobotics and Sensor-Based Localization · Advanced Neural Network Applications · Advanced Vision and Imaging
MethodsNeighborhood Attention
