Can We Predict Performance of Large Models across Vision-Language Tasks?
Qinyu Zhao, Ming Xu, Kartik Gupta, Akshay Asthana, Liang Zheng, Stephen Gould

TL;DR
This paper introduces a probabilistic matrix factorization framework using MCMC to predict performance scores of large vision-language models across various tasks, reducing evaluation costs and estimating uncertainties.
Contribution
It proposes a novel matrix completion approach with enhancements for sparse data to accurately predict model performances and their uncertainties across tasks.
Findings
High accuracy in performance prediction
Reliable uncertainty estimates for model evaluation
Effective handling of sparse observed data
Abstract
Evaluating large vision-language models (LVLMs) is very expensive, due to high computational cost and the wide variety of tasks. The good news is that if we already have some observed performance scores, we may be able to infer unknown ones. In this study, we propose a new framework for predicting unknown performance scores based on observed ones from other LVLMs or tasks. We first formulate the performance prediction as a matrix completion task. Specifically, we construct a sparse performance matrix , where each entry represents the performance score of the -th model on the -th dataset. By applying probabilistic matrix factorization (PMF) with Markov chain Monte Carlo (MCMC), we can complete the performance matrix, i.e., predict unknown scores. Additionally, we estimate the uncertainty of performance prediction based on MCMC. Practitioners can evaluate…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and dialogue systems · Natural Language Processing Techniques
