Towards Open-Ended Visual Recognition with Large Language Model

Qihang Yu; Xiaohui Shen; Liang-Chieh Chen

arXiv:2311.08400·cs.CV·November 15, 2023·1 cites

Towards Open-Ended Visual Recognition with Large Language Model

Qihang Yu, Xiaohui Shen, Liang-Chieh Chen

PDF

Open Access 1 Repo

TL;DR

This paper introduces the OmniScient Model, a large language model-based mask classifier that predicts class labels generatively, eliminating the need for predefined class names during training and testing, and enabling robust open-ended visual recognition.

Contribution

The OmniScient Model is a novel LLM-based mask classifier that predicts class labels generatively, allowing open-ended recognition without human intervention or predefined class sets.

Findings

01

Achieves promising results on various benchmarks.

02

Effectively handles novel concepts in visual recognition.

03

Enables cross-dataset training without human interference.

Abstract

Localizing and recognizing objects in the open-ended physical world poses a long-standing challenge within the domain of machine perception. Recent methods have endeavored to address the issue by employing a class-agnostic mask (or box) proposal model, complemented by an open-vocabulary classifier (e.g., CLIP) using pre-extracted text embeddings. However, it is worth noting that these open-vocabulary recognition models still exhibit limitations in practical applications. On one hand, they rely on the provision of class names during testing, where the recognition performance heavily depends on this predefined set of semantic classes by users. On the other hand, when training with multiple datasets, human intervention is required to alleviate the label definition conflict between them. In this paper, we introduce the OmniScient Model (OSM), a novel Large Language Model (LLM) based mask…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

bytedance/omniscient-model
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Handwritten Text Recognition Techniques

MethodsSparse Evolutionary Training