Can We Predict Performance of Large Models across Vision-Language Tasks?

Qinyu Zhao; Ming Xu; Kartik Gupta; Akshay Asthana; Liang Zheng; Stephen Gould

arXiv:2410.10112·cs.CV·May 30, 2025

Can We Predict Performance of Large Models across Vision-Language Tasks?

Qinyu Zhao, Ming Xu, Kartik Gupta, Akshay Asthana, Liang Zheng, Stephen Gould

PDF

Open Access 1 Repo

TL;DR

This paper introduces a probabilistic matrix factorization framework using MCMC to predict performance scores of large vision-language models across various tasks, reducing evaluation costs and estimating uncertainties.

Contribution

It proposes a novel matrix completion approach with enhancements for sparse data to accurately predict model performances and their uncertainties across tasks.

Findings

01

High accuracy in performance prediction

02

Reliable uncertainty estimates for model evaluation

03

Effective handling of sparse observed data

Abstract

Evaluating large vision-language models (LVLMs) is very expensive, due to high computational cost and the wide variety of tasks. The good news is that if we already have some observed performance scores, we may be able to infer unknown ones. In this study, we propose a new framework for predicting unknown performance scores based on observed ones from other LVLMs or tasks. We first formulate the performance prediction as a matrix completion task. Specifically, we construct a sparse performance matrix $R$ , where each entry $R_{mn}$ represents the performance score of the $m$ -th model on the $n$ -th dataset. By applying probabilistic matrix factorization (PMF) with Markov chain Monte Carlo (MCMC), we can complete the performance matrix, i.e., predict unknown scores. Additionally, we estimate the uncertainty of performance prediction based on MCMC. Practitioners can evaluate…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

qinyu-allen-zhao/crosspred-lvlm
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and dialogue systems · Natural Language Processing Techniques