Towards General Purpose Vision Systems

Tanmay Gupta; Amita Kamath; Aniruddha Kembhavi; Derek Hoiem

arXiv:2104.00743·cs.CV·April 21, 2022·6 cites

Towards General Purpose Vision Systems

Tanmay Gupta, Amita Kamath, Aniruddha Kembhavi, Derek Hoiem

PDF

Open Access 2 Repos

TL;DR

This paper introduces GPV-1, a versatile vision-language model capable of handling diverse tasks without architectural modifications, aiming to simplify the development of general-purpose vision systems.

Contribution

The paper presents GPV-1, a task-agnostic architecture for vision tasks, along with evaluation methods for generality, transfer, and efficiency, advancing towards truly general-purpose vision systems.

Findings

01

GPV-1 performs well across multiple tasks.

02

GPV-1 can do zero-shot referring expressions.

03

Few-shot training improves GPV-1's zero-shot performance.

Abstract

Computer vision systems today are primarily N-purpose systems, designed and trained for a predefined set of tasks. Adapting such systems to new tasks is challenging and often requires non-trivial modifications to the network architecture (e.g. adding new output heads) or training process (e.g. adding new losses). To reduce the time and expertise required to develop new applications, we would like to create general purpose vision systems that can learn and perform a range of tasks without any modification to the architecture or learning process. In this paper, we propose GPV-1, a task-agnostic vision-language architecture that can learn and perform tasks that involve receiving an image and producing text and/or bounding boxes, including classification, localization, visual question answering, captioning, and more. We also propose evaluations of generality of architecture, skill-concept…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Topic Modeling