Occlusion-Aware Multimodal Beam Prediction and Pose Estimation for mmWave V2I
Abidemi Orimogunje, Hyunwoo Park, Kyeong-Ju Cha, Igbafe Orikumhi, Sunwoo Kim, Dejan Vukobratovic

TL;DR
This paper introduces an occlusion-aware multimodal learning framework using Transformer networks for improved beam prediction and pose estimation in mmWave V2I systems under dynamic blockage, combining multiple sensor modalities.
Contribution
It presents a novel multimodal fusion approach inspired by SLAM concepts that jointly predicts beam, blockage, and position, outperforming radio-only and camera-only baselines.
Findings
Achieves 50.92% Top-1 beam accuracy on 60 GHz dataset.
Outperforms radio-only and camera-only baselines in multimodal fusion.
Provides accurate 2D position with 1.33m RMSE.
Abstract
We propose an occlusion-aware multimodal learning framework that is inspired by simultaneous localization and mapping (SLAM) concepts for trajectory interpretation and pose prediction. Targeting mmWave vehicle-to-infrastructure (V2I) beam management under dynamic blockage, our Transformer-based fusion network ingests synchronized RGB images, LiDAR point clouds, radar range-angle maps, GNSS, and short-term mmWave power history. It jointly predicts the receive beam index, blockage probability, and 2D position using labels automatically derived from 64-beam sweep power vectors, while an offline LiDAR map enables SLAM-style trajectory visualization. On the 60 GHz DeepSense 6G Scenario 31 dataset, the model achieves 50.92\% Top-1 and 86.50\% Top-3 beam accuracy with 0.018 bits/s/Hz spectral-efficiency loss, 63.35\% blocked-class F1, and 1.33m position RMSE. Multimodal fusion outperforms…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
