Evaluating Cost-Accuracy Trade-offs in Multimodal Search Relevance Judgements
Silvia Terragni, Hoang Cuong, Joachim Daiber, Pallavi Gudipati, and, Pablo N. Mendes

TL;DR
This paper evaluates various large language and multimodal models for search relevance, analyzing their cost-accuracy trade-offs and context-dependent performance to guide practical model selection.
Contribution
It provides a comprehensive assessment of LLMs and MLLMs in multimodal search relevance, highlighting performance variability and cost considerations.
Findings
Model performance varies significantly across contexts.
Including visual components may reduce smaller model effectiveness.
Performance trade-offs depend on specific use cases.
Abstract
Large Language Models (LLMs) have demonstrated potential as effective search relevance evaluators. However, there is a lack of comprehensive guidance on which models consistently perform optimally across various contexts or within specific use cases. In this paper, we assess several LLMs and Multimodal Language Models (MLLMs) in terms of their alignment with human judgments across multiple multimodal search scenarios. Our analysis investigates the trade-offs between cost and accuracy, highlighting that model performance varies significantly depending on the context. Interestingly, in smaller models, the inclusion of a visual component may hinder performance rather than enhance it. These findings highlight the complexities involved in selecting the most appropriate model for practical applications.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Text Analysis Techniques · Information Retrieval and Search Behavior · Sentiment Analysis and Opinion Mining
