OSMDA: OpenStreetMap-based Domain Adaptation for Remote Sensing VLMs

Stefan Maria Ailuro; Mario Markov; Mohammad Mahdi; Delyan Boychev; Luc Van Gool; Danda Pani Paudel (INSAIT; Sofia University "St. Kliment Ohridski")

arXiv:2603.11804·cs.CV·March 26, 2026

OSMDA: OpenStreetMap-based Domain Adaptation for Remote Sensing VLMs

Stefan Maria Ailuro, Mario Markov, Mohammad Mahdi, Delyan Boychev, Luc Van Gool, Danda Pani Paudel (INSAIT, Sofia University "St. Kliment Ohridski")

PDF

Open Access

TL;DR

OSMDA introduces a self-contained domain adaptation method for remote sensing VLMs that leverages OpenStreetMap data and the model's own OCR capabilities, eliminating reliance on costly external annotations or large teacher models.

Contribution

The paper presents OSMDA, a novel framework that enables remote sensing VLMs to adapt to new domains using only the model and publicly available OSM data, without manual labels or external teachers.

Findings

01

Achieves state-of-the-art results on 10 benchmarks.

02

Substantially cheaper to train than teacher-dependent methods.

03

Effective in aligning models with crowd-sourced geographic data.

Abstract

Vision-Language Models (VLMs) adapted to remote sensing rely heavily on domain-specific image-text supervision, yet high-quality annotations for satellite and aerial imagery remain scarce and expensive to produce. Prevailing pseudo-labeling pipelines address this gap by distilling knowledge from large frontier models, but this dependence on large teachers is costly, limits scalability, and caps achievable performance at the ceiling of the teacher. We propose OSMDA: a self-contained domain adaptation framework that eliminates this dependency. Our key insight is that a capable base VLM can serve as its own annotation engine: by pairing aerial images with rendered OpenStreetMap (OSM) tiles, we leverage optical character recognition and chart comprehension capabilities of the model to generate captions enriched by OSM's vast auxiliary metadata. The model is then fine-tuned on the resulting…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Advanced Image and Video Retrieval Techniques