HyperSeg: Towards Universal Visual Segmentation with Large Language   Model

Cong Wei; Yujie Zhong; Haoxian Tan; Yong Liu; Zheng Zhao; Jie Hu,; Yujiu Yang

arXiv:2411.17606·cs.CV·December 3, 2024

HyperSeg: Towards Universal Visual Segmentation with Large Language Model

Cong Wei, Yujie Zhong, Haoxian Tan, Yong Liu, Zheng Zhao, Jie Hu,, Yujiu Yang

PDF

Open Access 1 Repo

TL;DR

HyperSeg is a novel VLLM-based universal segmentation model capable of handling both image and video perception tasks, including complex reasoning, by integrating hybrid recognition modules and temporal understanding.

Contribution

It introduces HyperSeg, the first universal segmentation model leveraging VLLMs for pixel-level perception across images and videos with reasoning capabilities.

Findings

01

Effective in universal image and video segmentation tasks

02

Handles complex reasoning perception tasks

03

Outperforms existing methods in accuracy

Abstract

This paper aims to address universal segmentation for image and video perception with the strong reasoning ability empowered by Visual Large Language Models (VLLMs). Despite significant progress in current unified segmentation methods, limitations in adaptation to both image and video scenarios, as well as the complex reasoning segmentation, make it difficult for them to handle various challenging instructions and achieve an accurate understanding of fine-grained vision-language correlations. We propose HyperSeg, the first VLLM-based universal segmentation model for pixel-level image and video perception, encompassing generic segmentation tasks and more complex reasoning perception tasks requiring powerful reasoning abilities and world knowledge. Besides, to fully leverage the recognition capabilities of VLLMs and the fine-grained visual information, HyperSeg incorporates hybrid entity…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

congvvc/HyperSeg
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Image Retrieval and Classification Techniques