Co-Win: Joint Object Detection and Instance Segmentation in LiDAR Point Clouds via Collaborative Window Processing
Haichuan Li, Tomi Westerlund

TL;DR
Co-Win introduces a novel BEV perception framework that combines point cloud encoding with window-based feature extraction and mask-based segmentation to improve object detection and scene understanding in autonomous driving.
Contribution
The paper presents a hierarchical architecture with a variational, mask-based segmentation approach for joint object detection and instance segmentation in LiDAR point clouds.
Findings
Enhanced scene decomposition accuracy
Improved object detection performance
More interpretable instance predictions
Abstract
Accurate perception and scene understanding in complex urban environments is a critical challenge for ensuring safe and efficient autonomous navigation. In this paper, we present Co-Win, a novel bird's eye view (BEV) perception framework that integrates point cloud encoding with efficient parallel window-based feature extraction to address the multi-modality inherent in environmental understanding. Our method employs a hierarchical architecture comprising a specialized encoder, a window-based backbone, and a query-based decoder head to effectively capture diverse spatial features and object relationships. Unlike prior approaches that treat perception as a simple regression task, our framework incorporates a variational approach with mask-based instance segmentation, enabling fine-grained scene decomposition and understanding. The Co-Win architecture processes point cloud data through…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
