IFTR: An Instance-Level Fusion Transformer for Visual Collaborative   Perception

Shaohong Wang; Lu Bin; Xinyu Xiao; Zhiyu Xiang; Hangguan Shan; Eryun; Liu

arXiv:2407.09857·cs.CV·July 16, 2024

IFTR: An Instance-Level Fusion Transformer for Visual Collaborative Perception

Shaohong Wang, Lu Bin, Xinyu Xiao, Zhiyu Xiang, Hangguan Shan, Eryun, Liu

PDF

Open Access 1 Repo

TL;DR

This paper introduces IFTR, a novel instance-level fusion transformer that enhances camera-only collaborative perception in autonomous driving by improving feature sharing, fusion, and communication efficiency, leading to significant performance gains.

Contribution

The paper proposes an instance-level fusion transformer (IFTR) that effectively fuses multi-agent visual features for collaborative perception, addressing the gap in camera-based systems and optimizing communication.

Findings

01

Achieved up to 57.96% improvement in AP@70 on DAIR-V2X dataset.

02

Demonstrated superior performance over state-of-the-art methods on multiple datasets.

03

Validated the effectiveness of key components through extensive experiments.

Abstract

Multi-agent collaborative perception has emerged as a widely recognized technology in the field of autonomous driving in recent years. However, current collaborative perception predominantly relies on LiDAR point clouds, with significantly less attention given to methods using camera images. This severely impedes the development of budget-constrained collaborative systems and the exploitation of the advantages offered by the camera modality. This work proposes an instance-level fusion transformer for visual collaborative perception (IFTR), which enhances the detection performance of camera-only collaborative perception systems through the communication and sharing of visual features. To capture the visual information from multiple agents, we design an instance feature aggregation that interacts with the visual features of individual agents using predefined grid-shaped bird eye view…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

wangsh0111/iftr
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsVisual Attention and Saliency Detection · Video Surveillance and Tracking Methods · Advanced Vision and Imaging

MethodsSoftmax · Attention Is All You Need