EmbedLLM: Learning Compact Representations of Large Language Models

Richard Zhuang; Tianhao Wu; Zhaojin Wen; Andrew Li; Jiantao Jiao,; Kannan Ramchandran

arXiv:2410.02223·cs.CL·October 18, 2024

EmbedLLM: Learning Compact Representations of Large Language Models

Richard Zhuang, Tianhao Wu, Zhaojin Wen, Andrew Li, Jiantao Jiao,, Kannan Ramchandran

PDF

Open Access 1 Repo 2 Datasets 3 Reviews

TL;DR

EmbedLLM introduces a novel framework for learning compact vector representations of large language models, enabling efficient model routing and performance prediction across multiple benchmarks without extra inference costs.

Contribution

The paper presents EmbedLLM, a new encoder-decoder approach for creating embeddings of LLMs, improving model routing accuracy and latency, and enabling performance forecasting without additional inference.

Findings

01

EmbedLLM outperforms prior methods in model routing accuracy and latency.

02

The embeddings can predict model performance on benchmarks without extra inference.

03

Probing shows embeddings capture key model characteristics, like coding specialization.

Abstract

With hundreds of thousands of language models available on Huggingface today, efficiently evaluating and utilizing these models across various downstream, tasks has become increasingly critical. Many existing methods repeatedly learn task-specific representations of Large Language Models (LLMs), which leads to inefficiencies in both time and computational resources. To address this, we propose EmbedLLM, a framework designed to learn compact vector representations, of LLMs that facilitate downstream applications involving many models, such as model routing. We introduce an encoder-decoder approach for learning such embeddings, along with a systematic framework to evaluate their effectiveness. Empirical results show that EmbedLLM outperforms prior methods in model routing both in accuracy and latency. Additionally, we demonstrate that our method can forecast a model's performance on…

Peer Reviews

Decision·ICLR 2025 Spotlight

Reviewer 01Rating 6Confidence 3

Strengths

Embedding LLMs to handle downstream tasks is indeed a fascinating approach! This method allows you to create compact representations of each model that capture its unique strengths and weaknesses, enabling efficient task-specific decisions without running each model on every input. This approach streamlines the workflow significantly, as it allows for general-purpose embeddings that can adapt to a variety of downstream tasks without retraining the models themselves. It's especially beneficial in

Weaknesses

The term "decoder" in this paper is a bit misleading. In typical encoder-decoder architectures, the "decoder" reconstructs or generates the output in its full or intended form, such as reconstructing text in sequence-to-sequence tasks. Here, however, the so-called "decoder" is merely a binary classifier that outputs a label indicating whether the LLM correctly answered a question. We have to re-train the embedder if we want to represent new models, this makes the whole framework non-scalable. I'

Reviewer 02Rating 8Confidence 4

Strengths

- The paper innovatively proposes the embedding of LLMs to facilitate managing and comparing them. - The experiments in the paper are comprehensive, tested on 112 large models

Weaknesses

- The paper proposes a method for encoding LLMs. However, in the implementation, this encoding is merely based on model IDs, treating each model entirely as a black box. With only 30,000 data for training, can the resulting encoding truly capture all the characteristics of the models? Large models differ significantly in their strengths across various domains and capabilities. Can such an approach, based solely on one round question-answer pairs, truly distinguish the models’ abilities when faci

Reviewer 03Rating 8Confidence 4

Strengths

1. A simple and intuitive method. 2. Once the embedding of an LLM is built, the performance of the LLM can be accessed without access to the model.

Weaknesses

1. The paper can be motivated better and clarify why the current formulation is intuitive, i.e., a model's behavior on new data points can be predicted based on its behavior on already existing data points. 2. Details are hard to follow or underspecified, e.g., kNN classifier, random routing. 3. There are no references to highly similar work that predicts performance on a new task based on the performance of existing tasks. For example, Xia et al. Predicting Performance for Natural Language Proc

Code & Models

Repositories

richardzhuang0412/embedllm
pytorchOfficial

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques