S3PT: Scene Semantics and Structure Guided Clustering to Boost Self-Supervised Pre-Training for Autonomous Driving
Maciej K. Wozniak, Hariprasath Govindarajan, Marvin Klingner, Camille, Maurice, B Ravi Kiran, Senthil Yogamani

TL;DR
This paper introduces S3PT, a scene-aware self-supervised pre-training method for autonomous driving that leverages semantic, spatial, and depth information to improve downstream detection and segmentation performance.
Contribution
S3PT proposes scene semantics and structure guided clustering techniques, including semantic distribution, spatial diversity, and depth-guided clustering, to enhance self-supervised pre-training for autonomous driving.
Findings
Improves semantic segmentation accuracy on nuScenes, nuImages, Cityscapes.
Enhances 3D object detection performance.
Shows promising domain translation capabilities.
Abstract
Recent self-supervised clustering-based pre-training techniques like DINO and Cribo have shown impressive results for downstream detection and segmentation tasks. However, real-world applications such as autonomous driving face challenges with imbalanced object class and size distributions and complex scene geometries. In this paper, we propose S3PT a novel scene semantics and structure guided clustering to provide more scene-consistent objectives for self-supervised training. Specifically, our contributions are threefold: First, we incorporate semantic distribution consistent clustering to encourage better representation of rare classes such as motorcycles or animals. Second, we introduce object diversity consistent spatial clustering, to handle imbalanced and diverse object sizes, ranging from large background areas to small objects such as pedestrians and traffic signs. Third, we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Management and Algorithms · Data Visualization and Analytics · Time Series Analysis and Forecasting
MethodsSoftmax · Linear Layer · Dense Connections · Layer Normalization · Residual Connection · Attention Is All You Need · Multi-Head Attention · Vision Transformer · self-DIstillation with NO labels
