UViM: A Unified Modeling Approach for Vision with Learned Guiding Codes
Alexander Kolesnikov, Andr\'e Susano Pinto, Lucas Beyer, Xiaohua Zhai,, Jeremiah Harmsen, Neil Houlsby

TL;DR
UViM presents a unified, task-agnostic approach for diverse vision tasks using a base model guided by learned codes and a language model for code generation, achieving competitive results.
Contribution
It introduces a universal modeling framework that eliminates task-specific modifications, combining a base vision model with a language model for guiding codes.
Findings
Achieves near state-of-the-art results on three vision tasks.
Demonstrates versatility across diverse vision applications.
Proves effectiveness of unified modeling without task-specific tuning.
Abstract
We introduce UViM, a unified approach capable of modeling a wide range of computer vision tasks. In contrast to previous models, UViM has the same functional form for all tasks; it requires no task-specific modifications which require extensive human expertise. The approach involves two components: (I) a base model (feed-forward) which is trained to directly predict raw vision outputs, guided by a learned discrete code and (II) a language model (autoregressive) that is trained to generate the guiding code. These components complement each other: the language model is well-suited to modeling structured interdependent data, while the base model is efficient at dealing with high-dimensional outputs. We demonstrate the effectiveness of UViM on three diverse and challenging vision tasks: panoptic segmentation, depth prediction and image colorization, where we achieve competitive and near…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsAdvanced Neural Network Applications · Multimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning
MethodsBalanced Selection
