S2R-ViT for Multi-Agent Cooperative Perception: Bridging the Gap from Simulation to Reality
Jinlong Li, Runsheng Xu, Xinyu Liu, Baolu Li, Qin Zou, Jiaqi Ma,, Hongkai Yu

TL;DR
This paper introduces S2R-ViT, a novel transfer learning framework using Vision Transformers to bridge the domain gap between simulated and real multi-agent perception data, significantly improving 3D object detection performance.
Contribution
It presents the first simulation-to-reality transfer learning approach for multi-agent perception with a new Vision Transformer that addresses both deployment and feature domain gaps.
Findings
S2R-ViT outperforms existing methods on OPV2V and V2V4Real datasets.
The framework effectively reduces the domain gap between simulation and real data.
Experimental results show significant improvements in 3D object detection accuracy.
Abstract
Due to the lack of enough real multi-agent data and time-consuming of labeling, existing multi-agent cooperative perception algorithms usually select the simulated sensor data for training and validating. However, the perception performance is degraded when these simulation-trained models are deployed to the real world, due to the significant domain gap between the simulated and real data. In this paper, we propose the first Simulation-to-Reality transfer learning framework for multi-agent cooperative perception using a novel Vision Transformer, named as S2R-ViT, which considers both the Deployment Gap and Feature Gap between simulated and real data. We investigate the effects of these two types of domain gaps and propose a novel uncertainty-aware vision transformer to effectively relief the Deployment Gap and an agent-based feature adaptation module with inter-agent and ego-agent…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
Topics3D Shape Modeling and Analysis · Advanced Neural Network Applications · Robotics and Sensor-Based Localization
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Byte Pair Encoding · Position-Wise Feed-Forward Layer · Label Smoothing · Residual Connection · Absolute Position Encodings · Adam · Layer Normalization
