OSMDA: OpenStreetMap-based Domain Adaptation for Remote Sensing VLMs
Stefan Maria Ailuro, Mario Markov, Mohammad Mahdi, Delyan Boychev, Luc Van Gool, Danda Pani Paudel (INSAIT, Sofia University "St. Kliment Ohridski")

TL;DR
OSMDA introduces a self-contained domain adaptation method for remote sensing VLMs that leverages OpenStreetMap data and the model's own OCR capabilities, eliminating reliance on costly external annotations or large teacher models.
Contribution
The paper presents OSMDA, a novel framework that enables remote sensing VLMs to adapt to new domains using only the model and publicly available OSM data, without manual labels or external teachers.
Findings
Achieves state-of-the-art results on 10 benchmarks.
Substantially cheaper to train than teacher-dependent methods.
Effective in aligning models with crowd-sourced geographic data.
Abstract
Vision-Language Models (VLMs) adapted to remote sensing rely heavily on domain-specific image-text supervision, yet high-quality annotations for satellite and aerial imagery remain scarce and expensive to produce. Prevailing pseudo-labeling pipelines address this gap by distilling knowledge from large frontier models, but this dependence on large teachers is costly, limits scalability, and caps achievable performance at the ceiling of the teacher. We propose OSMDA: a self-contained domain adaptation framework that eliminates this dependency. Our key insight is that a capable base VLM can serve as its own annotation engine: by pairing aerial images with rendered OpenStreetMap (OSM) tiles, we leverage optical character recognition and chart comprehension capabilities of the model to generate captions enriched by OSM's vast auxiliary metadata. The model is then fine-tuned on the resulting…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Advanced Image and Video Retrieval Techniques
