Ranked from Within: Ranking Large Multimodal Models Without Labels
Weijie Tu, Weijian Deng, Dylan Campbell, Yu Yao, Jiyang Zheng, Tom Gedeon, Tongliang Liu

TL;DR
This paper proposes an unsupervised method to rank large multimodal models based on their own uncertainty estimates, eliminating the need for labeled data and enabling efficient model selection across diverse tasks.
Contribution
It introduces a novel approach using uncertainty scores from softmax distributions to predict model performance without labels, validated across multiple benchmarks.
Findings
Uncertainty scores effectively predict model performance across tasks.
The method works without labeled data, saving annotation effort.
Unsupervised ranking correlates well with traditional performance metrics.
Abstract
Can the relative performance of a pre-trained large multimodal model (LMM) be predicted without access to labels? As LMMs proliferate, it becomes increasingly important to develop efficient ways to choose between them when faced with new data or tasks. The usual approach does the equivalent of giving the models an exam and marking them. We opt to avoid marking and the associated labor of determining the ground-truth answers. Instead, we explore other signals elicited and ascertain how well the models know their own limits, evaluating the effectiveness of these signals at unsupervised model ranking. We evaluate state-of-the-art LMMs (\eg, LLaVA) across visual question answering benchmarks, analyzing how well uncertainty-based metrics can predict relative model performance. Our findings show that uncertainty scores derived from softmax distributions provide a robust and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Image Retrieval and Classification Techniques
MethodsSoftmax
