Learning on Model Weights using Tree Experts

Eliahu Horwitz; Bar Cavia; Jonathan Kahana; Yedid Hoshen

arXiv:2410.13569·cs.LG·June 4, 2025

Learning on Model Weights using Tree Experts

Eliahu Horwitz, Bar Cavia, Jonathan Kahana, Yedid Hoshen

PDF

Open Access 10 Models 1 Datasets

TL;DR

This paper introduces ProbeX, a lightweight probing method that leverages the structure of Model Trees to classify models and predict their training data categories using only model weights, enabling zero-shot model search.

Contribution

The paper presents ProbeX, a novel probing technique designed specifically for single-layer model weights, and demonstrates its effectiveness in classifying models and their training data.

Findings

01

ProbeX accurately predicts training data categories from model weights.

02

Model weights within the same Model Tree show less nuisance variation.

03

ProbeX enables zero-shot model classification via weight-language embedding.

Abstract

The number of publicly available models is rapidly increasing, yet most remain undocumented. Users looking for suitable models for their tasks must first determine what each model does. Training machine learning models to infer missing documentation directly from model weights is challenging, as these weights often contain significant variation unrelated to model functionality (denoted nuisance). Here, we identify a key property of real-world models: most public models belong to a small set of Model Trees, where all models within a tree are fine-tuned from a common ancestor (e.g., a foundation model). Importantly, we find that within each tree there is less nuisance variation between models. Concretely, while learning across Model Trees requires complex architectures, even a linear classifier trained on a single model layer often works within trees. While effective, these linear…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

Datasets

ProbeX/Model-J
dataset· 355 dl
355 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Bayesian Modeling and Causal Inference

MethodsDiffusion · Sparse Evolutionary Training