ConInfer: Context-Aware Inference for Training-Free Open-Vocabulary Remote Sensing Segmentation

Wenyang Chen; Zhanxuan Hu; Yaping Zhang; Hailong Ning; and Yonghang Tai

arXiv:2603.29271·cs.CV·April 1, 2026

ConInfer: Context-Aware Inference for Training-Free Open-Vocabulary Remote Sensing Segmentation

Wenyang Chen, Zhanxuan Hu, Yaping Zhang, Hailong Ning, and Yonghang Tai

PDF

1 Repo

TL;DR

ConInfer introduces a context-aware inference framework that jointly predicts across spatial units and models their semantic dependencies, significantly improving open-vocabulary remote sensing segmentation accuracy.

Contribution

It is the first to explicitly incorporate global context and inter-unit dependencies in training-free remote sensing segmentation, enhancing consistency and robustness.

Findings

01

Outperforms state-of-the-art baselines by 2.80% and 6.13% on benchmark datasets.

02

Enhances segmentation consistency and generalization in complex environments.

03

Demonstrates effectiveness across multiple remote sensing tasks.

Abstract

Training-free open-vocabulary remote sensing segmentation (OVRSS), empowered by vision-language models, has emerged as a promising paradigm for achieving category-agnostic semantic understanding in remote sensing imagery. Existing approaches mainly focus on enhancing feature representations or mitigating modality discrepancies to improve patch-level prediction accuracy. However, such independent prediction schemes are fundamentally misaligned with the intrinsic characteristics of remote sensing data. In real-world applications, remote sensing scenes are typically large-scale and exhibit strong spatial as well as semantic correlations, making isolated patch-wise predictions insufficient for accurate segmentation. To address this limitation, we propose ConInfer, a context-aware inference framework for OVRSS that performs joint prediction across multiple spatial units while explicitly…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Dog-Yang/ConInfer
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.