Enhancing Remote Sensing Vision-Language Models for Zero-Shot Scene   Classification

Karim El Khoury; Maxime Zanella; Beno\^it G\'erin; Tiffanie Godelaine,; Beno\^it Macq; Sa\"id Mahmoudi; Christophe De Vleeschouwer; Ismail Ben Ayed

arXiv:2409.00698·cs.CV·January 8, 2025

Enhancing Remote Sensing Vision-Language Models for Zero-Shot Scene Classification

Karim El Khoury, Maxime Zanella, Beno\^it G\'erin, Tiffanie Godelaine,, Beno\^it Macq, Sa\"id Mahmoudi, Christophe De Vleeschouwer, Ismail Ben Ayed

PDF

Open Access 1 Repo

TL;DR

This paper improves zero-shot remote sensing scene classification by leveraging transductive inference with vision-language models, utilizing contextual information without supervision, leading to significant accuracy gains.

Contribution

It introduces a transductive inference method that enhances zero-shot classification in remote sensing by exploiting text prompts and patch relationships, without additional supervision.

Findings

01

Significant accuracy improvements over inductive methods on 10 datasets

02

Effective utilization of contextual information enhances zero-shot performance

03

Method maintains low computational cost

Abstract

Vision-Language Models for remote sensing have shown promising uses thanks to their extensive pretraining. However, their conventional usage in zero-shot scene classification methods still involves dividing large images into patches and making independent predictions, i.e., inductive inference, thereby limiting their effectiveness by ignoring valuable contextual information. Our approach tackles this issue by utilizing initial predictions based on text prompting and patch affinity relationships from the image encoder to enhance zero-shot capabilities through transductive inference, all without the need for supervision and at a minor computational cost. Experiments on 10 remote sensing datasets with state-of-the-art Vision-Language Models demonstrate significant accuracy improvements over inductive zero-shot classification. Our source code is publicly available on Github:…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

elkhouryk/rs-transclip
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRemote-Sensing Image Classification