Mean-Field Limits for Two-Layer Neural Networks Trained with Consensus-Based Optimization

William De Deyn; Michael Herty; Giovanni Samaey

arXiv:2511.21466·cs.LG·December 3, 2025

Mean-Field Limits for Two-Layer Neural Networks Trained with Consensus-Based Optimization

William De Deyn, Michael Herty, Giovanni Samaey

PDF

Open Access

TL;DR

This paper investigates the mean-field limit of consensus-based optimization (CBO) for two-layer neural networks, compares its performance with Adam, and introduces a hybrid approach for faster convergence and reduced memory usage.

Contribution

It reformulates CBO within the optimal transport framework, couples it with neural network mean-field limits, and demonstrates improved convergence and memory efficiency.

Findings

01

CBO's variance decreases monotonically in the mean-field limit.

02

Hybrid CBO-Adam approach converges faster than CBO alone.

03

Reformulation reduces memory overhead in multi-task learning.

Abstract

We study Consensus-Based Optimization (CBO) for two-layer neural network training. We compare the performance of CBO against Adam on two test cases and demonstrate how a hybrid approach, combining CBO with Adam, provides faster convergence than CBO. Additionally, in the context of multi-task learning, we recast CBO into a formulation that offers less memory overhead. The CBO method allows for a mean-field limit formulation, which we couple with the mean-field limit of the neural network. To this end, we first reformulate CBO within the optimal transport framework. In the limit of infinitely many particles, we define the corresponding dynamics on the Wasserstein-over-Wasserstein space and show that the variance decreases monotonically.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Model Reduction and Neural Networks · Quantum many-body systems