PartGLEE: A Foundation Model for Recognizing and Parsing Any Objects

Junyi Li; Junfeng Wu; Weizhi Zhao; Song Bai; Xiang Bai

arXiv:2407.16696·cs.CV·July 24, 2024

PartGLEE: A Foundation Model for Recognizing and Parsing Any Objects

Junyi Li, Junfeng Wu, Weizhi Zhao, Song Bai, Xiang Bai

PDF

1 Repo

TL;DR

PartGLEE is a unified foundation model that recognizes and parses objects and their parts at any granularity, improving hierarchical understanding and perception in images for open-world scenarios.

Contribution

It introduces a hierarchical framework with Q-Former for part-level recognition, extending capabilities beyond previous models like GLEE.

Findings

01

Achieves state-of-the-art performance on part-level tasks

02

Obtains competitive results on object-level tasks

03

Enhances hierarchical modeling and detailed image comprehension

Abstract

We present PartGLEE, a part-level foundation model for locating and identifying both objects and parts in images. Through a unified framework, PartGLEE accomplishes detection, segmentation, and grounding of instances at any granularity in the open world scenario. Specifically, we propose a Q-Former to construct the hierarchical relationship between objects and parts, parsing every object into corresponding semantic parts. By incorporating a large amount of object-level data, the hierarchical relationships can be extended, enabling PartGLEE to recognize a rich variety of parts. We conduct comprehensive studies to validate the effectiveness of our method, PartGLEE achieves the state-of-the-art performance across various part-level tasks and obtain competitive results on object-level tasks. The proposed PartGLEE significantly enhances hierarchical modeling capabilities and part-level…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ProvenceStar/PartGLEE
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.