Mirror, Mirror on the Wall -- Which is the Best Model of Them All?
Dina Sayed, Heiko Schuldt

TL;DR
This paper analyzes the current landscape of large language models through leaderboards and benchmarks, focusing on the medical domain, and introduces a systematic Model Selection Methodology to aid in choosing the most suitable model for specific tasks.
Contribution
It provides a detailed analysis of quantitative evaluation metrics for LLMs and proposes a new systematic approach for model selection tailored to particular use cases.
Findings
Leaderboards effectively rank models based on standardized benchmarks.
The medical domain case study illustrates the evolution and significance of quantitative evaluations.
The proposed MSM helps in systematically selecting models aligned with specific needs.
Abstract
Large Language Models (LLMs) have become one of the most transformative tools across many applications, as they have significantly boosted productivity and achieved impressive results in various domains such as finance, healthcare, education, telecommunications, and law, among others. Typically, state-of-the-art (SOTA) foundation models are developed by large corporations based on large data collections and substantial computational and financial resources required to pretrain such models from scratch. These foundation models then serve as the basis for further development and domain adaptation for specific use cases or tasks. However, given the dynamic and fast-paced nature of launching new foundation models, the process of selecting the most suitable model for a particular use case, application, or domain becomes increasingly complex. We argue that there are two main dimensions that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsArtificial Intelligence in Healthcare and Education · Machine Learning in Healthcare · Topic Modeling
