LangHOPS: Language Grounded Hierarchical Open-Vocabulary Part Segmentation
Yang Miao, Jan-Nico Zaech, Xi Wang, Fabien Despinoy, Danda Pani Paudel, Luc Van Gool

TL;DR
LangHOPS is a novel multimodal large language model framework that enables open-vocabulary hierarchical object-part segmentation by grounding hierarchies in language and leveraging rich reasoning capabilities, achieving state-of-the-art results.
Contribution
It introduces the first MLLM-based approach for open-vocabulary object-part segmentation, integrating language grounding into hierarchical parsing.
Findings
Surpasses previous methods by 5.5% AP on PartImageNet
Achieves 4.8% higher AP in cross-dataset scenarios
Attains 2.5% mIOU improvement on unseen parts in ADE20K
Abstract
We propose LangHOPS, the first Multimodal Large Language Model (MLLM) based framework for open-vocabulary object-part instance segmentation. Given an image, LangHOPS can jointly detect and segment hierarchical object and part instances from open-vocabulary candidate categories. Unlike prior approaches that rely on heuristic or learnable visual grouping, our approach grounds object-part hierarchies in language space. It integrates the MLLM into the object-part parsing pipeline to leverage its rich knowledge and reasoning capabilities, and link multi-granularity concepts within the hierarchies. We evaluate LangHOPS across multiple challenging scenarios, including in-domain and cross-dataset object-part instance segmentation, and zero-shot semantic segmentation. LangHOPS achieves state-of-the-art results, surpassing previous methods by 5.5% Average Precision (AP) (in-domain) and 4.8%…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Advanced Neural Network Applications · Topic Modeling
