ELEVATER: A Benchmark and Toolkit for Evaluating Language-Augmented Visual Models
Chunyuan Li, Haotian Liu, Liunian Harold Li, Pengchuan Zhang, Jyoti, Aneja, Jianwei Yang, Ping Jin, Houdong Hu, Zicheng Liu, Yong Jae Lee, and, Jianfeng Gao

TL;DR
ELEVATER is a comprehensive benchmark and toolkit designed to evaluate the transferability of language-augmented visual models across multiple datasets and tasks, addressing the lack of standardized evaluation tools.
Contribution
It introduces the first unified benchmark and toolkit for assessing pre-trained language-augmented visual models on diverse downstream tasks.
Findings
Effective evaluation of transferability across datasets
Automated hyper-parameter tuning for model assessment
Metrics for sample-efficiency and parameter-efficiency
Abstract
Learning visual representations from natural language supervision has recently shown great promise in a number of pioneering works. In general, these language-augmented visual models demonstrate strong transferability to a variety of datasets and tasks. However, it remains challenging to evaluate the transferablity of these models due to the lack of easy-to-use evaluation toolkits and public benchmarks. To tackle this, we build ELEVATER (Evaluation of Language-augmented Visual Task-level Transfer), the first benchmark and toolkit for evaluating(pre-trained) language-augmented visual models. ELEVATER is composed of three components. (i) Datasets. As downstream evaluation suites, it consists of 20 image classification datasets and 35 object detection datasets, each of which is augmented with external knowledge. (ii) Toolkit. An automatic hyper-parameter tuning toolkit is developed to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Advanced Neural Network Applications
