Deep Linear Probe Generators for Weight Space Learning

Jonathan Kahana; Eliahu Horwitz; Imri Shuval; Yedid Hoshen

arXiv:2410.10811·cs.LG·October 23, 2025

Deep Linear Probe Generators for Weight Space Learning

Jonathan Kahana, Eliahu Horwitz, Imri Shuval, Yedid Hoshen

PDF

Open Access 3 Reviews

TL;DR

This paper introduces Deep Linear Probe Generators (ProbeGen), a novel method that enhances probing techniques for weight space learning by reducing overfitting and improving efficiency in extracting neural network information.

Contribution

ProbeGen is a simple, effective modification to probing that incorporates a deep linear generator, significantly outperforming state-of-the-art methods while being computationally more efficient.

Findings

01

ProbeGen outperforms existing methods in accuracy.

02

It requires 30 to 1000 times fewer FLOPs.

03

It effectively reduces overfitting in probing approaches.

Abstract

Weight space learning aims to extract information about a neural network, such as its training dataset or generalization error. Recent approaches learn directly from model weights, but this presents many challenges as weights are high-dimensional and include permutation symmetries between neurons. An alternative approach, Probing, represents a model by passing a set of learned inputs (probes) through the model, and training a predictor on top of the corresponding outputs. Although probing is typically not used as a stand alone approach, our preliminary experiment found that a vanilla probing baseline worked surprisingly well. However, we discover that current probe learning strategies are ineffective. We therefore propose Deep Linear Probe Generators (ProbeGen), a simple and effective modification to probing approaches. ProbeGen adds a shared generator module with a deep linear…

Peer Reviews

Decision·ICLR 2025 Poster

Reviewer 01Rating 6Confidence 5

Strengths

- This work addresses the novel and intriguing field of weight space learning, aiming to tackle the significant challenge of simplifying optimization within this domain. - The proposed method ProbeGen is simple, easy to follow, and yet effective. - The authors empirically demonstrate the motivation behind ProbeGen, providing interesting insights. - The paper is well-written and easy to follow.

Weaknesses

- The experimental part is limited. From an empirical-oriented paper, I expect the experimental section to be more comprehensive. For instance, the effectiveness of ProbeGen is shown only on INRs overlooking equivariant tasks in weight space learning like domain adaptation [1], alignment [2], editing [3], and more. - The authors mentioned that ProbeGen could be used for processing high-dimensional inputs, however, the experimental section focuses on small-scale INRs instead of modern architectur

Reviewer 02Rating 6Confidence 4

Strengths

The paper is well written and is quite easy to follow. The connections with binary code analysis and the corresponding insights are very useful towards more scalable weight-space methods. The proposed method reaches state-of-the-art performance with high computational efficiency. The authors provide a comprehensive set of ablation studies that offers significant insights on probing-based methods for weight space learning.

Weaknesses

The proposed method cannot be used in weight-level tasks, such as editing neural network weights, which represents a large class of tasks in learning in weight spaces. The impact of this work would greatly increase if the authors include a more detailed discussion on why that is the case, and potential directions to alleviate this limitation. It is unclear if the proposed method can be used to complement existing mechanistic approaches and result in a performance greater than either of the two

Reviewer 03Rating 6Confidence 4

Strengths

* The idea of using a deep linear network to obtain implicit regularization while guarding against potential overfitting is nice and original in that context. * The proposed approach is extremely more efficient compared to methods that learn directly on the weights. * The paper is written clearly and adequately. The authors empirically justify their design choices every step of the way and show other alternatives (such as learned vs unlearned probs, and the architecture decisions about the gener

Weaknesses

* The authors discussed frankly about the main limitations of their approach, which I agree with. In particular, there are some tasks on which it is not immediately apparent how to use linear probing, such as generative tasks. It may be valuable if the authors could come up with possible remedies for that, but I understand if they don't since it is beyond the scope of this work and maybe non-trivial. * Experiments: * Section 5.2 demonstrates the importance of deep linear classifiers compared

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHand Gesture Recognition Systems · Advanced Fiber Optic Sensors · Speech and Audio Processing

MethodsSparse Evolutionary Training