AFFormer: Adaptive Feature Fusion Transformer for V2X Cooperative Perception under Channel Impairments
Xi Zhou, Tao Huang, Qing-Long Han, Rana Abbas, Mostafa Rahimi Azghadi

TL;DR
AFFormer is a Transformer-based framework designed to improve the robustness of V2X cooperative perception against communication channel impairments, enhancing autonomous vehicle safety.
Contribution
This work introduces AFFormer, a novel Transformer architecture with modules for multi-agent, temporal, and spatial feature fusion, plus a knowledge distillation strategy for robustness.
Findings
AFFormer outperforms existing methods under communication impairments.
It maintains a good balance between efficiency and accuracy.
Validated on V2XSet and DAIR-V2X datasets.
Abstract
Accurate 3D object detection is essential for ensuring the safety of autonomous vehicles. Cooperative perception, which leverages vehicle-to-everything (V2X) communication to share perceptual data, enhances detection but is vulnerable to channel impairments, such as noise, fading, and interference. To strengthen the reliability of intelligent transportation systems, this work improves the robustness of V2X cooperative perception under communication conditions that reflect common channel impairments. This paper proposes an Adaptive Feature Fusion Transformer (AFFormer), a Transformer-based framework that mitigates the adverse effects of corrupted features by modeling temporal, inter-agent, and spatial correlations. AFFormer introduces three key modules: Multi-Agent and Temporal Aggregation for context-aware fusion across agents and over time, Dual Spatial Attention for efficient modeling…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
