Linear Mode Connectivity under Data Shifts for Deep Ensembles of Image Classifiers
C. Hepburn, T. Zielke, A.P. Raulf

TL;DR
This paper investigates how data shifts affect linear mode connectivity in deep image classifiers, revealing how training parameters influence model convergence and ensemble diversity under data distribution changes.
Contribution
It provides an experimental analysis of linear mode connectivity under data shifts and interprets data shifts as additional stochastic gradient noise, offering insights into model convergence and ensemble effectiveness.
Findings
Data shifts act as extra stochastic noise affecting model convergence.
Small learning rates and large batch sizes reduce the impact of data shifts.
LMC models tend to make similar errors but enable efficient, diverse ensembles.
Abstract
The phenomenon of linear mode connectivity (LMC) links several aspects of deep learning, including training stability under noisy stochastic gradients, the smoothness and generalization of local minima (basins), the similarity and functional diversity of sampled models, and architectural effects on data processing. In this work, we experimentally study LMC under data shifts and identify conditions that mitigate their impact. We interpret data shifts as an additional source of stochastic gradient noise, which can be reduced through small learning rates and large batch sizes. These parameters influence whether models converge to the same local minimum or to regions of the loss landscape with varying smoothness and generalization. Although models sampled via LMC tend to make similar errors more frequently than those converging to different basins, the benefit of LMC lies in balancing…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Reservoir Computing · Model Reduction and Neural Networks · Domain Adaptation and Few-Shot Learning
