OVerSeeC: Open-Vocabulary Costmap Generation from Satellite Images and Natural Language

Rwik Rana; Jesse Quattrociocchi; Dongmyeong Lee; Christian Ellis; Amanda Adkins; Adam Uccello; Garrett Warnell; Joydeep Biswas

arXiv:2602.18606·cs.RO·March 10, 2026

OVerSeeC: Open-Vocabulary Costmap Generation from Satellite Images and Natural Language

Rwik Rana, Jesse Quattrociocchi, Dongmyeong Lee, Christian Ellis, Amanda Adkins, Adam Uccello, Garrett Warnell, Joydeep Biswas

PDF

Open Access

TL;DR

This paper introduces OVerSeeC, a modular framework that uses foundation models to generate global costmaps from satellite images based on natural language instructions, enabling flexible, open-vocabulary, and mission-specific planning.

Contribution

The paper presents a novel zero-shot, modular approach combining language models and perception pipelines to generate costmaps from satellite imagery based on natural language, addressing limitations of fixed ontologies.

Findings

01

Handles novel entities and preferences effectively.

02

Produces routes aligned with human trajectories.

03

Demonstrates robustness to distribution shifts.

Abstract

Aerial imagery provides essential global context for autonomous navigation, enabling route planning at scales inaccessible to onboard sensing. We address the problem of generating global costmaps for long-range planning directly from satellite imagery when entities and mission-specific traversal rules are expressed in natural language at test time. This setting is challenging since mission requirements vary, terrain entities may be unknown at deployment, and user prompts often encode compositional traversal logic. Existing approaches relying on fixed ontologies and static cost mappings cannot accommodate such flexibility. While foundation models excel at language interpretation and open-vocabulary perception, no single model can simultaneously parse nuanced mission directives, locate arbitrary entities in large-scale imagery, and synthesize them into an executable cost function for…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAutomated Road and Building Extraction · Geographic Information Systems Studies · Multimodal Machine Learning Applications