Visual Recognition by Request

Chufeng Tang; Lingxi Xie; Xiaopeng Zhang; Xiaolin Hu; Qi Tian

arXiv:2207.14227·cs.CV·December 13, 2022

Visual Recognition by Request

Chufeng Tang, Lingxi Xie, Xiaopeng Zhang, Xiaolin Hu, Qi Tian

PDF

Open Access 1 Repo

TL;DR

This paper introduces ViRReq, a new visual recognition paradigm that enables flexible, hierarchical, and request-based recognition by decomposing tasks into atomic requests and utilizing a knowledge base, improving recognition of complex and new concepts.

Contribution

The paper proposes a novel request-based recognition framework that learns hierarchical structures from incomplete data and easily incorporates new concepts, advancing visual recognition capabilities.

Findings

01

Effective recognition of hierarchical structures on CPP and ADE20K datasets.

02

Ability to learn from incomplete annotations and add new concepts with minimal effort.

03

Integration of language-driven recognition into segmentation methods.

Abstract

Humans have the ability of recognizing visual semantics in an unlimited granularity, but existing visual recognition algorithms cannot achieve this goal. In this paper, we establish a new paradigm named visual recognition by request (ViRReq) to bridge the gap. The key lies in decomposing visual recognition into atomic tasks named requests and leveraging a knowledge base, a hierarchical and text-based dictionary, to assist task definition. ViRReq allows for (i) learning complicated whole-part hierarchies from highly incomplete annotations and (ii) inserting new concepts with minimal efforts. We also establish a solid baseline by integrating language-driven recognition into recent semantic and instance segmentation methods, and demonstrate its flexible recognition ability on CPP and ADE20K, two datasets with hierarchical whole-part annotations.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

chufengt/ViRReq
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Advanced Image and Video Retrieval Techniques

MethodsBalanced Selection