Model Diffusion for Certifiable Few-shot Transfer Learning

Fady Rezk; Royson Lee; Henry Gouk; Timothy Hospedales; Minyoung Kim

arXiv:2502.06970·cs.LG·May 29, 2025

Model Diffusion for Certifiable Few-shot Transfer Learning

Fady Rezk, Royson Lee, Henry Gouk, Timothy Hospedales, Minyoung Kim

PDF

Open Access 3 Reviews

TL;DR

This paper introduces a novel transfer learning method that uses diffusion models to generate a finite set of parameter-efficient fine-tuning options, enabling certifiable generalization guarantees in low-data scenarios.

Contribution

It develops a diffusion-based approach for transfer learning that provides non-vacuous theoretical generalization guarantees in low-shot settings, unlike traditional methods.

Findings

01

Provides tighter risk bounds compared to existing approaches.

02

Demonstrates non-trivial generalization guarantees in low-shot transfer learning.

03

Uses a finite set of PEFT samples for certifiable learning.

Abstract

In contemporary deep learning, a prevalent and effective workflow for solving low-data problems is adapting powerful pre-trained foundation models (FMs) to new tasks via parameter-efficient fine-tuning (PEFT). However, while empirically effective, the resulting solutions lack generalisation guarantees to certify their accuracy - which may be required for ethical or legal reasons prior to deployment in high-importance applications. In this paper we develop a novel transfer learning approach that is designed to facilitate non-vacuous learning theoretic generalisation guarantees for downstream tasks, even in the low-shot regime. Specifically, we first use upstream tasks to train a distribution over PEFT parameters. We then learn the downstream task by a sample-and-evaluate procedure -- sampling plausible PEFTs from the trained diffusion model and selecting the one with the highest…

Peer Reviews

Decision·Submitted to ICLR 2026

Reviewer 01Rating 6Confidence 3

Strengths

The paper is well written and straightforwward to follow; while STEEL itself is sensible - you utilize a diffusion model as a weight candidate generator for your test-time PEFT in order to provide tighter (and actual) generalization bounds. The combination of PEFT hypernetwork and generalization bounds is, to the best of my knowledge, novel - and sensible, with convincing results in 5.1 and 5.2 on both LLM and Vision-model adaptation.

Weaknesses

I do think this paper provides a very interesting use of PEFT hypernetworks for generalization bounds, which to me comes with one major question / issue: The whole approach hinges on L. 193: "We expect that Θ is rich enough to represent the true task distribution ptrue(T) faithfully, and the adapted (“selected”) θ will generalize well on unseen samples from T", which is a very, very strong statement to make. By default, STEEL is likely much more limited when it comes to adapting to larger distr

Reviewer 02Rating 8Confidence 3

Strengths

- A neat combination of weight-space generative modeling (DDPM) to form a finite hypothesis set and evaluate–then–select to keep the complexity term fixed while minimizing empirical risk—yielding non-vacuous certificates in few-shot regimes. - Experimental breadth & rigor. Evaluations span multiple LaMP tasks and vision datasets under standard meta-learning protocols, reporting not only accuracy but also % non-vacuous, bound statistics, and gap. - The paper visualizes how certification varies wi

Weaknesses

(W1) The paper fixes LoRA-XS (~2.6K params) and CoOp (1,024-dim prompt) without varying LoRA rank or token count. Given both diffusion learnability and certification can depend on $\dim(\theta)$, a $\theta$-size sweep would strengthen the claims. (W2) Fairness to model-zoo under matched search. Hierarchical search is an inference strategy, not unique to STEEL. A controlled comparison where model-zoo also uses the same k-means/medoid + top-15 pipeline would isolate the benefit of diffusion sampl

Reviewer 03Rating 4Confidence 3

Strengths

The paper provides some theoretical analysis, and the experiments indicate improved bounds and non-trivial generalization guarantees.

Weaknesses

1. The title in the submitted PDF differs from the one on OpenReview, which raises concerns. 2. The main contribution appears to be learning a parameter diffusion model to generate PEFTs according to the task distribution. However, the architecture and training of the diffusion model are unclear. Section 3.3 discusses its role only at a high level, stating it is trained on PEFT parameters {θi}, but omits key architectural details and training objectives. The paper seems to assume readers are alr

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning

MethodsDiffusion · Sparse Evolutionary Training