Neural Thickets: Diverse Task Experts Are Dense Around Pretrained Weights

Yulu Gan; Phillip Isola

arXiv:2603.12228·cs.LG·March 13, 2026

Neural Thickets: Diverse Task Experts Are Dense Around Pretrained Weights

Yulu Gan, Phillip Isola

PDF

Open Access

TL;DR

This paper proposes viewing pretrained models as distributions over parameters containing many task-specific experts, and demonstrates that large models have dense neighborhoods of such experts, enabling simple sampling-based methods to improve performance.

Contribution

It introduces a new perspective on pretraining as a distribution over parameters with dense task experts, and shows simple sampling methods can effectively find task-specific solutions in large models.

Findings

01

Large models have a high density of task experts around pretrained weights.

02

Sampling and ensembling parameter perturbations can match standard post-training methods.

03

Simple parallel sampling approaches are competitive with complex optimization techniques.

Abstract

Pretraining produces a learned parameter vector that is typically treated as a starting point for further iterative adaptation. In this work, we instead view the outcome of pretraining as a distribution over parameter vectors, whose support already contains task-specific experts. We show that in small models such expert solutions occupy a negligible fraction of the volume of this distribution, making their discovery reliant on structured optimization methods such as gradient descent. In contrast, in large, well-pretrained models the density of task-experts increases dramatically, so that diverse, task-improving specialists populate a substantial fraction of the neighborhood around the pretrained weights. Motivated by this perspective, we explore a simple, fully parallel post-training method that samples $N$ parameter perturbations at random, selects the top $K$ , and ensembles…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Domain Adaptation and Few-Shot Learning · Generative Adversarial Networks and Image Synthesis