Minimizing Layerwise Activation Norm Improves Generalization in Federated Learning

M Yashwanth; Gaurav Kumar Nayak; Harsh Rangwani; Arya Singh; R. Venkatesh Babu; and Anirban Chakraborty

arXiv:2512.08314·cs.LG·December 10, 2025

Minimizing Layerwise Activation Norm Improves Generalization in Federated Learning

M Yashwanth, Gaurav Kumar Nayak, Harsh Rangwani, Arya Singh, R. Venkatesh Babu, and Anirban Chakraborty

PDF

1 Video

TL;DR

This paper introduces a novel regularization technique called 'MAN' that minimizes layerwise activation norms to promote flatter minima, thereby enhancing the generalization of federated learning models.

Contribution

It proposes a new flatness-constrained federated learning optimization method using activation norm minimization, with theoretical analysis and practical improvements over existing techniques.

Findings

01

Significant improvement in model generalization in federated learning.

02

Theoretical proof that minimizing activation norms reduces Hessian eigenvalues.

03

Achieved state-of-the-art results on federated learning benchmarks.

Abstract

Federated Learning (FL) is an emerging machine learning framework that enables multiple clients (coordinated by a server) to collaboratively train a global model by aggregating the locally trained models without sharing any client's training data. It has been observed in recent works that learning in a federated manner may lead the aggregated global model to converge to a 'sharp minimum' thereby adversely affecting the generalizability of this FL-trained model. Therefore, in this work, we aim to improve the generalization performance of models trained in a federated setup by introducing a 'flatness' constrained FL optimization problem. This flatness constraint is imposed on the top eigenvalue of the Hessian computed from the training loss. As each client trains a model on its local data, we further re-formulate this complex problem utilizing the client loss functions and propose a new…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Minimizing Layerwise Activation Norm Improves Generalization in Federated Learning· youtube