EmbedLLM: Learning Compact Representations of Large Language Models
Richard Zhuang, Tianhao Wu, Zhaojin Wen, Andrew Li, Jiantao Jiao,, Kannan Ramchandran

TL;DR
EmbedLLM introduces a novel framework for learning compact vector representations of large language models, enabling efficient model routing and performance prediction across multiple benchmarks without extra inference costs.
Contribution
The paper presents EmbedLLM, a new encoder-decoder approach for creating embeddings of LLMs, improving model routing accuracy and latency, and enabling performance forecasting without additional inference.
Findings
EmbedLLM outperforms prior methods in model routing accuracy and latency.
The embeddings can predict model performance on benchmarks without extra inference.
Probing shows embeddings capture key model characteristics, like coding specialization.
Abstract
With hundreds of thousands of language models available on Huggingface today, efficiently evaluating and utilizing these models across various downstream, tasks has become increasingly critical. Many existing methods repeatedly learn task-specific representations of Large Language Models (LLMs), which leads to inefficiencies in both time and computational resources. To address this, we propose EmbedLLM, a framework designed to learn compact vector representations, of LLMs that facilitate downstream applications involving many models, such as model routing. We introduce an encoder-decoder approach for learning such embeddings, along with a systematic framework to evaluate their effectiveness. Empirical results show that EmbedLLM outperforms prior methods in model routing both in accuracy and latency. Additionally, we demonstrate that our method can forecast a model's performance on…
Peer Reviews
Decision·ICLR 2025 Spotlight
Embedding LLMs to handle downstream tasks is indeed a fascinating approach! This method allows you to create compact representations of each model that capture its unique strengths and weaknesses, enabling efficient task-specific decisions without running each model on every input. This approach streamlines the workflow significantly, as it allows for general-purpose embeddings that can adapt to a variety of downstream tasks without retraining the models themselves. It's especially beneficial in
The term "decoder" in this paper is a bit misleading. In typical encoder-decoder architectures, the "decoder" reconstructs or generates the output in its full or intended form, such as reconstructing text in sequence-to-sequence tasks. Here, however, the so-called "decoder" is merely a binary classifier that outputs a label indicating whether the LLM correctly answered a question. We have to re-train the embedder if we want to represent new models, this makes the whole framework non-scalable. I'
- The paper innovatively proposes the embedding of LLMs to facilitate managing and comparing them. - The experiments in the paper are comprehensive, tested on 112 large models
- The paper proposes a method for encoding LLMs. However, in the implementation, this encoding is merely based on model IDs, treating each model entirely as a black box. With only 30,000 data for training, can the resulting encoding truly capture all the characteristics of the models? Large models differ significantly in their strengths across various domains and capabilities. Can such an approach, based solely on one round question-answer pairs, truly distinguish the models’ abilities when faci
1. A simple and intuitive method. 2. Once the embedding of an LLM is built, the performance of the LLM can be accessed without access to the model.
1. The paper can be motivated better and clarify why the current formulation is intuitive, i.e., a model's behavior on new data points can be predicted based on its behavior on already existing data points. 2. Details are hard to follow or underspecified, e.g., kNN classifier, random routing. 3. There are no references to highly similar work that predicts performance on a new task based on the performance of existing tasks. For example, Xia et al. Predicting Performance for Natural Language Proc
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques
