Text2Seg: Remote Sensing Image Semantic Segmentation via Text-Guided Visual Foundation Models
Jielu Zhang, Zhongliang Zhou, Gengchen Mai, Mengxuan Hu, Zihan Guan, Sheng Li, Lan Mu

TL;DR
Text2Seg introduces a novel approach for remote sensing semantic segmentation that leverages visual foundation models and automatic prompt generation to reduce annotation dependency and improve zero-shot performance across diverse datasets.
Contribution
The paper presents Text2Seg, a new method that overcomes annotation limitations and enhances transferability in remote sensing segmentation by using visual foundation models and automatic prompts.
Findings
Significant improvement in zero-shot segmentation performance, with relative gains from 31% to 225%.
Reduces reliance on fully annotated datasets through automatic prompt generation.
Enhances generalization ability across diverse remote sensing datasets.
Abstract
Remote sensing imagery has attracted significant attention in recent years due to its instrumental role in global environmental monitoring, land usage monitoring, and more. As image databases grow each year, performing automatic segmentation with deep learning models has gradually become the standard approach for processing the data. Despite the improved performance of current models, certain limitations remain unresolved. Firstly, training deep learning models for segmentation requires per-pixel annotations. Given the large size of datasets, only a small portion is fully annotated and ready for training. Additionally, the high intra-dataset variance in remote sensing data limits the transfer learning ability of such models. Although recently proposed generic segmentation models like SAM have shown promising results in zero-shot instance-level segmentation, adapting them to semantic…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image and Video Retrieval Techniques · Multimodal Machine Learning Applications · Image Retrieval and Classification Techniques
MethodsMulti-Head Attention · Attention Is All You Need · Vision Transformer · Linear Layer · Adam · Dense Connections · Label Smoothing · Dropout · Absolute Position Encodings · Position-Wise Feed-Forward Layer
