RemoteSAM: Towards Segment Anything for Earth Observation
Liang Yao, Fan Liu, Delong Chen, Chuanyi Zhang, Yijun Wang, Ziyun Chen, Wei Xu, Shimin Di, Yuhui Zheng

TL;DR
RemoteSAM is a new foundation model for Earth observation that leverages a large-scale dataset and a unified task paradigm to excel across diverse perception tasks with high efficiency.
Contribution
The paper introduces a scalable data engine and a task unification paradigm, enabling a versatile and efficient Earth observation foundation model called RemoteSAM.
Findings
Achieved state-of-the-art results on multiple Earth observation benchmarks.
Created the largest diverse dataset of 270K image-text-mask triplets.
Outperformed existing models like Falcon, GeoChat, and LHRS-Bot.
Abstract
We aim to develop a robust yet flexible visual foundation model for Earth observation. It should possess strong capabilities in recognizing and localizing diverse visual targets while providing compatibility with various input-output interfaces required across different task scenarios. Current systems cannot meet these requirements, as they typically utilize task-specific architecture trained on narrow data domains with limited semantic coverage. Our study addresses these limitations from two aspects: data and modeling. We first introduce an automatic data engine that enjoys significantly better scalability compared to previous human annotation or rule-based approaches. It has enabled us to create the largest dataset of its kind to date, comprising 270K image-text-mask triplets covering an unprecedented range of diverse semantic categories and attribute specifications. Based on this…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMethane Hydrates and Related Phenomena · Robotics and Sensor-Based Localization
