Rethinking Electro-Optical Vision Foundation Models for Remote Sensing Retrieval: A Controlled Comparison with Generalist VFM

Hyobin Park; Minseok Seo; Dong-Geol Choi

arXiv:2605.02283·cs.CV·May 5, 2026

Rethinking Electro-Optical Vision Foundation Models for Remote Sensing Retrieval: A Controlled Comparison with Generalist VFM

Hyobin Park, Minseok Seo, Dong-Geol Choi

PDF

TL;DR

This study compares electro-optical and generalist vision foundation models for remote sensing image retrieval, finding that generalist models often perform as well or better, especially in cross-scene scenarios.

Contribution

It provides a controlled evaluation showing that generalist models are competitive with EO-specific models for remote sensing retrieval tasks.

Findings

01

Generalist models are often as effective or better than EO-specific models.

02

EO-specific models degrade significantly under cross-scene evaluation.

03

Pretraining strategies for EO models need improvement to better utilize remote sensing data.

Abstract

Vision foundation models have attracted significant attention for their ability to leverage large-scale unlabeled visual data. This advantage is particularly important in remote sensing, where data acquisition is costly and annotation often requires expert knowledge. Recent electro-optical vision foundation models aim to learn domain-specific representations from remote sensing imagery, but it remains unclear whether they are more effective than strong generalist vision foundation models under retrieval-based evaluation. In this study, we conduct a controlled comparison between representative EO-specific and generalist vision foundation models for remote sensing image retrieval. Using the same datasets, retrieval protocol, and evaluation metric, we evaluate both in-domain performance and cross-scene generalization. Our results show that strong generalist vision foundation models are…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.