Co-Win: Joint Object Detection and Instance Segmentation in LiDAR Point Clouds via Collaborative Window Processing

Haichuan Li; Tomi Westerlund

arXiv:2507.19691·cs.CV·July 29, 2025

Co-Win: Joint Object Detection and Instance Segmentation in LiDAR Point Clouds via Collaborative Window Processing

Haichuan Li, Tomi Westerlund

PDF

TL;DR

Co-Win introduces a novel BEV perception framework that combines point cloud encoding with window-based feature extraction and mask-based segmentation to improve object detection and scene understanding in autonomous driving.

Contribution

The paper presents a hierarchical architecture with a variational, mask-based segmentation approach for joint object detection and instance segmentation in LiDAR point clouds.

Findings

01

Enhanced scene decomposition accuracy

02

Improved object detection performance

03

More interpretable instance predictions

Abstract

Accurate perception and scene understanding in complex urban environments is a critical challenge for ensuring safe and efficient autonomous navigation. In this paper, we present Co-Win, a novel bird's eye view (BEV) perception framework that integrates point cloud encoding with efficient parallel window-based feature extraction to address the multi-modality inherent in environmental understanding. Our method employs a hierarchical architecture comprising a specialized encoder, a window-based backbone, and a query-based decoder head to effectively capture diverse spatial features and object relationships. Unlike prior approaches that treat perception as a simple regression task, our framework incorporates a variational approach with mask-based instance segmentation, enabling fine-grained scene decomposition and understanding. The Co-Win architecture processes point cloud data through…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.