Cross-Domain Semantic Segmentation with Large Language Model-Assisted   Descriptor Generation

Philip Hughes; Larry Burns; Luke Adams

arXiv:2501.16467·cs.CV·January 29, 2025

Cross-Domain Semantic Segmentation with Large Language Model-Assisted Descriptor Generation

Philip Hughes, Larry Burns, Luke Adams

PDF

Open Access

TL;DR

LangSeg introduces a novel approach that leverages large language models to generate fine-grained descriptors, enhancing semantic segmentation performance across diverse scenes without extensive retraining.

Contribution

The paper presents LangSeg, a new LLM-guided segmentation framework that integrates descriptor generation with Vision Transformers, improving accuracy and flexibility in semantic segmentation tasks.

Findings

01

Achieves up to 6.1% improvement in mIoU on ADE20K and COCO-Stuff datasets.

02

Outperforms state-of-the-art models in semantic segmentation.

03

Validated through ablation and human evaluation studies.

Abstract

Semantic segmentation plays a crucial role in enabling machines to understand and interpret visual scenes at a pixel level. While traditional segmentation methods have achieved remarkable success, their generalization to diverse scenes and unseen object categories remains limited. Recent advancements in large language models (LLMs) offer a promising avenue for bridging visual and textual modalities, providing a deeper understanding of semantic relationships. In this paper, we propose LangSeg, a novel LLM-guided semantic segmentation method that leverages context-sensitive, fine-grained subclass descriptors generated by LLMs. Our framework integrates these descriptors with a pre-trained Vision Transformer (ViT) to achieve superior segmentation performance without extensive model retraining. We evaluate LangSeg on two challenging datasets, ADE20K and COCO-Stuff, where it outperforms…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Web Data Mining and Analysis

MethodsAttention Is All You Need · Softmax · Adam · Residual Connection · Dropout · Absolute Position Encodings · Byte Pair Encoding · Linear Layer · Vision Transformer · Multi-Head Attention