An Extensible Framework for Open Heterogeneous Collaborative Perception

Yifan Lu; Yue Hu; Yiqi Zhong; Dequan Wang; Yanfeng Wang; Siheng Chen

arXiv:2401.13964·cs.CV·April 2, 2024·2 cites

An Extensible Framework for Open Heterogeneous Collaborative Perception

Yifan Lu, Yue Hu, Yiqi Zhong, Dequan Wang, Yanfeng Wang, Siheng Chen

PDF

Open Access 1 Repo 3 Reviews

TL;DR

This paper introduces HEAL, an extensible framework for collaborative perception that effectively integrates new heterogeneous agents with minimal retraining, demonstrated on new datasets and outperforming state-of-the-art methods.

Contribution

Proposes HEAL, a novel framework enabling seamless integration of emerging heterogeneous agents into collaborative perception with low training costs.

Findings

01

HEAL outperforms SOTA methods on OPV2V-H and DAIR-V2X datasets.

02

HEAL reduces training parameters by 91.5% when adding new agent types.

03

Introduces OPV2V-H, a large-scale dataset with diverse sensor modalities.

Abstract

Collaborative perception aims to mitigate the limitations of single-agent perception, such as occlusions, by facilitating data exchange among multiple agents. However, most current works consider a homogeneous scenario where all agents use identity sensors and perception models. In reality, heterogeneous agent types may continually emerge and inevitably face a domain gap when collaborating with existing agents. In this paper, we introduce a new open heterogeneous problem: how to accommodate continually emerging new heterogeneous agent types into collaborative perception, while ensuring high perception performance and low integration cost? To address this problem, we propose HEterogeneous ALliance (HEAL), a novel extensible collaborative perception framework. HEAL first establishes a unified feature space with initial agents via a novel multi-scale foreground-aware Pyramid Fusion…

Peer Reviews

Decision·ICLR 2024 poster

Reviewer 01Rating 8· accept, good paperConfidence 5

Strengths

1. Presents a highly intriguing new question, effectively addressing the challenges faced during the deployment of multi-agent collaborative perception systems. 2. The approach in this paper is notably succinct and efficient; the authors design a novel backward alignment mechanism for individual training. This method constructs an alignable feature space, facilitating subsequent updates of features transmitted by other agents.

Weaknesses

1. The intermediate fusion method employed in this paper doesn't seem to address the issue of new agents joining as effectively as late fusion does. 2. This paper has only conducted experiments on two datasets, one of which is generated for the first time in this paper. It is hoped that the author can introduce more experiments to substantiate.

Reviewer 02Rating 6· marginally above the acceptance thresholdConfidence 5

Strengths

1. The paper introduces a interesting open heterogeneous collaborative perception setting. Agents with different sensor can collaborate for vision tasks. This is an interesting and practical setting. 2. Multi-scale feature fusion and the 'late participation' strategy is reasonable for such tasks. 3. A dataset contribution. Experiments are extensive. Presentation of the paper is good.

Weaknesses

1. There is no real-world experiments. There are some dataset like nuScene/nuPlan, Waymo and etc including data of different sensors. It would be nice to show some real examples. 2. It would be interesting to include a bit discussion on related works for cooperation for driving tasks, e.g. [1][2][3] 3. I don't find a code release. Would be nice to release the code for supplementary or public github repo. [1] D Chen and et al. Learning from All Vehicles. CVPR 2022. [2] J Cui and et al. Coopernau

Reviewer 03Rating 5· marginally below the acceptance thresholdConfidence 5

Strengths

* The proposed training design is simple yet effective for faster convergence and higher performance. * The proposed methods achieve outstanding performance.

Weaknesses

* When comparing with other SOTA algorithms, is the same training strategy used for a fair comparison? Is the main performance boost from the training strategy or the multi-modal fusion design? * The fusion model design (except the residual part) shows similarity with existing methods like who2com/where2comm/disconet. Please justify and highlight the differences and novelty. Please also benchmark the performance under the same training strategy with only different fusion networks so as to demon

Code & Models

Repositories

yifanlu0227/heal
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRobotics and Automated Systems · Semantic Web and Ontologies

MethodsALIGN