LLM-Seg: Bridging Image Segmentation and Large Language Model Reasoning
Junchi Wang, Lei Ke

TL;DR
This paper introduces LLM-Seg, a novel framework that combines large language model reasoning with image segmentation, along with a new dataset, enabling better interpretation of implicit user intentions in segmentation tasks.
Contribution
The work presents a new reasoning segmentation framework and an automatic dataset generation pipeline, creating the LLM-Seg40K dataset for training and evaluating reasoning segmentation models.
Findings
LLM-Seg achieves competitive performance on reasoning segmentation tasks.
The dataset pipeline efficiently produces high-quality reasoning segmentation data.
LLM-Seg40K serves as a new benchmark for reasoning segmentation approaches.
Abstract
Understanding human instructions to identify the target objects is vital for perception systems. In recent years, the advancements of Large Language Models (LLMs) have introduced new possibilities for image segmentation. In this work, we delve into reasoning segmentation, a novel task that enables segmentation system to reason and interpret implicit user intention via large language model reasoning and then segment the corresponding target. Our work on reasoning segmentation contributes on both the methodological design and dataset labeling. For the model, we propose a new framework named LLM-Seg. LLM-Seg effectively connects the current foundational Segmentation Anything Model and the LLM by mask proposals selection. For the dataset, we propose an automatic data generation pipeline and construct a new reasoning segmentation dataset named LLM-Seg40K. Experiments demonstrate that our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications
