Vision Transformer Based User Equipment Positioning
Parshwa Shah, Dhaval K. Patel, Brijesh Soni, Miguel L\'opez-Ben\'itez, Siddhartan Govindasamy

TL;DR
This paper introduces a Vision Transformer-based model for user equipment positioning using CSI data, significantly improving accuracy over existing methods in indoor and outdoor scenarios.
Contribution
The paper presents a novel ViT architecture tailored for CSI data, addressing limitations of previous models and achieving superior positioning accuracy.
Findings
Achieves RMSE of 0.55m indoors and 13.59m outdoors in DeepMIMO
Outperforms state-of-the-art schemes by approximately 38%
Substantially better error distribution compared to other approaches
Abstract
Recently, Deep Learning (DL) techniques have been used for User Equipment (UE) positioning. However, the key shortcomings of such models is that: i) they weigh the same attention to the entire input; ii) they are not well suited for the non-sequential data e.g., when only instantaneous Channel State Information (CSI) is available. In this context, we propose an attention-based Vision Transformer (ViT) architecture that focuses on the Angle Delay Profile (ADP) from CSI matrix. Our approach, validated on the `DeepMIMO' and `ViWi' ray-tracing datasets, achieves an Root Mean Squared Error (RMSE) of 0.55m indoors, 13.59m outdoors in DeepMIMO, and 3.45m in ViWi's outdoor blockage scenario. The proposed scheme outperforms state-of-the-art schemes by 38\%. It also performs substantially better than other approaches that we have considered in terms of the distribution of error distance.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsIndoor and Outdoor Localization Technologies · Advanced Neural Network Applications · Robotics and Sensor-Based Localization
