RCDPT: Radar-Camera fusion Dense Prediction Transformer
Chen-Chou Lo, Patrick Vandewalle

TL;DR
This paper introduces RCDPT, a novel radar-camera fusion method using a dense prediction transformer that enhances depth estimation by integrating radar data without relying on readout tokens, outperforming existing models.
Contribution
The paper proposes a new radar-camera fusion strategy for dense prediction transformers that improves depth estimation performance by reassembling camera and radar representations.
Findings
Outperforms existing convolutional depth estimation models with radar integration.
Better fusion strategy than common approaches on nuScenes dataset.
Enhances monocular depth estimation accuracy using radar data.
Abstract
Recently, transformer networks have outperformed traditional deep neural networks in natural language processing and show a large potential in many computer vision tasks compared to convolutional backbones. In the original transformer, readout tokens are used as designated vectors for aggregating information from other tokens. However, the performance of using readout tokens in a vision transformer is limited. Therefore, we propose a novel fusion strategy to integrate radar data into a dense prediction transformer network by reassembling camera representations with radar representations. Instead of using readout tokens, radar representations contribute additional depth information to a monocular depth estimation model and improve performance. We further investigate different fusion approaches that are commonly used for integrating additional modality in a dense prediction transformer…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Image Processing Techniques and Applications · Advanced Optical Sensing Technologies
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Softmax · Dense Connections · Layer Normalization · Residual Connection · Vision Transformer
