UViM: A Unified Modeling Approach for Vision with Learned Guiding Codes

Alexander Kolesnikov; Andr\'e Susano Pinto; Lucas Beyer; Xiaohua Zhai,; Jeremiah Harmsen; Neil Houlsby

arXiv:2205.10337·cs.CV·October 17, 2022·23 cites

UViM: A Unified Modeling Approach for Vision with Learned Guiding Codes

Alexander Kolesnikov, Andr\'e Susano Pinto, Lucas Beyer, Xiaohua Zhai,, Jeremiah Harmsen, Neil Houlsby

PDF

Open Access 1 Repo 1 Video

TL;DR

UViM presents a unified, task-agnostic approach for diverse vision tasks using a base model guided by learned codes and a language model for code generation, achieving competitive results.

Contribution

It introduces a universal modeling framework that eliminates task-specific modifications, combining a base vision model with a language model for guiding codes.

Findings

01

Achieves near state-of-the-art results on three vision tasks.

02

Demonstrates versatility across diverse vision applications.

03

Proves effectiveness of unified modeling without task-specific tuning.

Abstract

We introduce UViM, a unified approach capable of modeling a wide range of computer vision tasks. In contrast to previous models, UViM has the same functional form for all tasks; it requires no task-specific modifications which require extensive human expertise. The approach involves two components: (I) a base model (feed-forward) which is trained to directly predict raw vision outputs, guided by a learned discrete code and (II) a language model (autoregressive) that is trained to generate the guiding code. These components complement each other: the language model is well-suited to modeling structured interdependent data, while the base model is efficient at dealing with high-dimensional outputs. We demonstrate the effectiveness of UViM on three diverse and challenging vision tasks: panoptic segmentation, depth prediction and image colorization, where we achieve competitive and near…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

google-research/big_vision
jaxOfficial

Videos

UViM: A Unified Modeling Approach for Vision with Learned Guiding Codes· slideslive

Taxonomy

TopicsAdvanced Neural Network Applications · Multimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning

MethodsBalanced Selection