Vision Transformers for Efficient Indoor Pathloss Radio Map Prediction

Rafayel Mkrtchyan; Edvard Ghukasyan; Khoren Petrosyan; Hrant; Khachatrian; Theofanis P. Raptis

arXiv:2412.09507·cs.CV·May 9, 2025

Vision Transformers for Efficient Indoor Pathloss Radio Map Prediction

Rafayel Mkrtchyan, Edvard Ghukasyan, Khoren Petrosyan, Hrant, Khachatrian, Theofanis P. Raptis

PDF

Open Access

TL;DR

This paper introduces a vision transformer-based deep learning model with pretrained weights for indoor radio pathloss prediction, emphasizing data augmentation and feature engineering to improve accuracy and robustness in complex environments.

Contribution

The work presents a novel application of vision transformers with pretrained weights for indoor radio map prediction, highlighting the importance of data augmentation and feature engineering.

Findings

01

Extensive data augmentation enhances model generalization.

02

Feature engineering is vital in low-data scenarios.

03

The model demonstrates robustness across various environments.

Abstract

Indoor pathloss prediction is a fundamental task in wireless network planning, yet it remains challenging due to environmental complexity and data scarcity. In this work, we propose a deep learning-based approach utilizing a vision transformer (ViT) architecture with DINO-v2 pretrained weights to model indoor radio propagation. Our method processes a floor map with additional features of the walls to generate indoor pathloss maps. We systematically evaluate the effects of architectural choices, data augmentation strategies, and feature engineering techniques. Our findings indicate that extensive augmentation significantly improves generalization, while feature engineering is crucial in low-data regimes. Through comprehensive experiments, we demonstrate the robustness of our model across different generalization scenarios.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsIndoor and Outdoor Localization Technologies · Speech and Audio Processing · Advanced Image and Video Retrieval Techniques

MethodsAttention Is All You Need · Layer Normalization · Softmax · Residual Connection · Linear Layer · Multi-Head Attention · Dense Connections · Vision Transformer