NeuroTrails: Training with Dynamic Sparse Heads as the Key to Effective Ensembling

Bram Grooten; Farid Hasanov; Chenxiang Zhang; Qiao Xiao; Boqian Wu; Zahra Atashgahi; Ghada Sokar; Shiwei Liu; Lu Yin; Elena Mocanu; Mykola Pechenizkiy; Decebal Constantin Mocanu

arXiv:2505.17909·cs.LG·May 26, 2025

NeuroTrails: Training with Dynamic Sparse Heads as the Key to Effective Ensembling

Bram Grooten, Farid Hasanov, Chenxiang Zhang, Qiao Xiao, Boqian Wu, Zahra Atashgahi, Ghada Sokar, Shiwei Liu, Lu Yin, Elena Mocanu, Mykola Pechenizkiy, Decebal Constantin Mocanu

PDF

TL;DR

NeuroTrails introduces a dynamic sparse multi-head architecture that enhances ensemble performance and robustness in deep learning tasks while significantly reducing computational resources.

Contribution

It proposes a novel, model-agnostic training paradigm with dynamic sparsity that improves ensemble effectiveness without multiple full models.

Findings

01

Achieves higher accuracy on ImageNet with fewer parameters.

02

Enhances zero-shot robustness in language models.

03

Effective across convolutional and transformer architectures.

Abstract

Model ensembles have long been a cornerstone for improving generalization and robustness in deep learning. However, their effectiveness often comes at the cost of substantial computational overhead. To address this issue, state-of-the-art methods aim to replicate ensemble-class performance without requiring multiple independently trained networks. Unfortunately, these algorithms often still demand considerable compute at inference. In response to these limitations, we introduce $NeuroTrails$ , a sparse multi-head architecture with dynamically evolving topology. This unexplored model-agnostic training paradigm improves ensemble performance while reducing the required resources. We analyze the underlying reason for its effectiveness and observe that the various neural trails induced by dynamic sparsity attain a $Goldilocks zone$ of prediction diversity. NeuroTrails…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.