Beyond BEV: Optimizing Point-Level Tokens for Collaborative Perception
Yang Li, Quan Yuan, Guiyang Luo, Xiaoyuan Fu, Rui Pan, Yujia Yang, Congzhang Shao, Yuewen Liu, Jinglin Li

TL;DR
CoPLOT introduces point-level tokens for collaborative perception, effectively capturing 3D structural cues and improving object recognition and localization while reducing communication and computation costs.
Contribution
The paper proposes CoPLOT, a novel framework utilizing point-native processing, semantic-aware token reordering, and multi-agent alignment to enhance collaborative perception.
Findings
Outperforms state-of-the-art models on multiple datasets.
Reduces communication and computation overhead.
Effectively captures 3D structural cues for better perception.
Abstract
Collaborative perception allows agents to enhance their perceptual capabilities by exchanging intermediate features. Existing methods typically organize these intermediate features as 2D bird's-eye-view (BEV) representations, which discard critical fine-grained 3D structural cues essential for accurate object recognition and localization. To this end, we first introduce point-level tokens as intermediate representations for collaborative perception. However, point-cloud data are inherently unordered, massive, and position-sensitive, making it challenging to produce compact and aligned point-level token sequences that preserve detailed structural information. Therefore, we present CoPLOT, a novel Collaborative perception framework that utilizes Point-Level Optimized Tokens. It incorporates a point-native processing pipeline, including token reordering, sequence modeling, and multi-agent…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
