Plex: Towards Reliability using Pretrained Large Model Extensions

Dustin Tran; Jeremiah Liu; Michael W. Dusenberry; Du Phan; Mark; Collier; Jie Ren; Kehang Han; Zi Wang; Zelda Mariet; Huiyi Hu; Neil Band; Tim; G. J. Rudner; Karan Singhal; Zachary Nado; Joost van Amersfoort; Andreas; Kirsch; Rodolphe Jenatton; Nithum Thain; Honglin Yuan; Kelly Buchanan; Kevin; Murphy; D. Sculley; Yarin Gal; Zoubin Ghahramani; Jasper Snoek; Balaji; Lakshminarayanan

arXiv:2207.07411·cs.LG·July 18, 2022·38 cites

Plex: Towards Reliability using Pretrained Large Model Extensions

Dustin Tran, Jeremiah Liu, Michael W. Dusenberry, Du Phan, Mark, Collier, Jie Ren, Kehang Han, Zi Wang, Zelda Mariet, Huiyi Hu, Neil Band, Tim, G. J. Rudner, Karan Singhal, Zachary Nado, Joost van Amersfoort, Andreas, Kirsch, Rodolphe Jenatton, Nithum Thain, Honglin Yuan

PDF

Open Access 1 Repo

TL;DR

This paper introduces Plex, a set of pretrained large model extensions for vision and language that significantly enhance reliability across diverse decision-making tasks, outperforming previous methods without extensive tuning.

Contribution

The paper presents Plex, novel pretrained model extensions for vision and language that improve reliability across multiple tasks and simplify evaluation protocols.

Findings

01

Plex achieves state-of-the-art reliability performance.

02

Scaling model size and data improves reliability.

03

Effective on zero-shot open set recognition and active learning.

Abstract

A recent trend in artificial intelligence is the use of pretrained models for language and vision tasks, which have achieved extraordinary performance but also puzzling failures. Probing these models' abilities in diverse ways is therefore critical to the field. In this paper, we explore the reliability of models, where we define a reliable model as one that not only achieves strong predictive performance but also performs well consistently over many decision-making tasks involving uncertainty (e.g., selective prediction, open set recognition), robust generalization (e.g., accuracy and proper scoring rules such as log-likelihood on in- and out-of-distribution datasets), and adaptation (e.g., active learning, few-shot uncertainty). We devise 10 types of tasks over 40 datasets in order to evaluate different aspects of reliability on both vision and language domains. To improve…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

google/uncertainty-baselines
tfOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Topic Modeling · Domain Adaptation and Few-Shot Learning