Improving Generalization by Permutation Routing Across Model Copies
Shuhei Kashiwamura, Timothee Leleu

TL;DR
This paper proposes a novel method using the M-cover transform to enhance neural network generalization by routing messages across multiple model copies without parameter averaging.
Contribution
It introduces a structured message routing framework that improves generalization by leveraging permutations across model copies, applicable to various neural network architectures.
Findings
The method improves generalization in perceptrons and multilayer networks.
Structured message sharing outperforms traditional replica coupling.
The framework applies to both discrete models and differentiable neural networks.
Abstract
We introduce a use of the \(M\)-cover (or \(M\)-layer) transform for machine learning. The method replicates a model \(M\) times, but instead of coupling the copies through parameter averaging or an explicit attractive force, as in replicated SGD or Elastic SGD, it rewires the contexts in which local learning messages are computed. Each local loss is evaluated on a routed model whose parameters are drawn from different copies according to permutations sampled from a structured mixing kernel \(Q\). Training then uses the original local update rule, while the resulting learning messages are redistributed across the copies through these routed computational paths. Thus \(Q\) defines a topology for message transport and controls the long-loop structure of the lifted factor graph. We formulate this construction for perceptrons, committee machines, and multilayer perceptrons, showing that the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
