Entropy-driven Fair and Effective Federated Learning
Lin Wang, Zhichao Wang, Ye Shi, Sai Praneeth Karimireddy, Xiaoying Tang

TL;DR
This paper introduces an entropy-driven federated learning algorithm that balances fairness among clients with overall model performance, using a bi-level optimization framework and theoretical guarantees.
Contribution
The paper presents a novel entropy-based aggregation method with an analytic solution, improving fairness and performance in federated learning.
Findings
Outperforms state-of-the-art federated fairness algorithms.
Guarantees convergence in non-convex settings.
Enhances fairness and global accuracy simultaneously.
Abstract
Federated Learning (FL) enables collaborative model training across distributed devices while preserving data privacy. Nonetheless, the heterogeneity of edge devices often leads to inconsistent performance of the globally trained models, resulting in unfair outcomes among users. Existing federated fairness algorithms strive to enhance fairness but often fall short in maintaining the overall performance of the global model, typically measured by the average accuracy across all clients. To address this issue, we propose a novel algorithm that leverages entropy-based aggregation combined with model and gradient alignments to simultaneously optimize fairness and global model performance. Our method employs a bi-level optimization framework, where we derive an analytic solution to the aggregation probability in the inner loop, making the optimization process computationally efficient.…
Peer Reviews
Decision·Submitted to ICLR 2026
1) Addresses a critical challenge in federated learning by simultaneously optimizing both global model performance and fairness across clients through a principled bi-level optimization framework. Previous methods like AFL seem to do well on client level but loose performance on global level. 2) Provides comprehensive empirical validation supported by rigorous theoretical analysis, featuring experiments across multiple datasets with diverse client level fairness metrics including variance, wors
1) Communication overhead per global round is not clearly outlined: Algorithm 1 indicates that FedEBA+ requires transmitting a fair gradient back to clients and collecting per-client losses and gradients at the current model state, which means additional downlink and uplink communication beyond FedAvg's requirements. Prac-FedEBA+ claims to maintain the same communication pattern as FedAvg. Can you include a small table comparing FedEBA+/Practical version with a breakdown of per round communic
* Constrained entropy maximization and performance fairness in FL - The authors provided thorough justification that the proposed objective can improve performance fairness by reducing to the low variance of performance distribution. (in Eq. (3), Proposition 4.1, and Appendix I.1) * Addendum on global utility - The authors proposed an additional trick to strike balance between utility and fairness of a global model by aligning the global update using the server-side ideal global gradient. (
- While different in motivation and objective, the resulting update formula coincides with that of `AAggFF`, cited in the draft. - The alignment update is not novel, proposed and used similarly in `FedFA` (Wang et al., 2021) and `FedMDFG` (Pan et al., 2023), which are cited in lines 43-44. - In Proposition 4.3, although authors provided the approximation method of aligned gradient, the increase in communication cost is unavoidable. - That being said, each client should upload local gradients,
1. The problem is interesting and important. Although FL fairness is a widely studied topic, it has never been entirely solved. 2. The algorithm using entropy-based aggregation and adaptive optimization strategy is novel and interesting. 3. The theoretical analysis is extensive and sound. 4. The experiments are convincing and show the promising consequences of the algorithm.
In general, it is a decent work. But my main concern is the strong assumption for the theoretical analysis. The part (2) of Theorem 5.4 is the key part of theoretical analysis, yet it assumes strong convexity of the loss function. This is a strong condition and may downgrade the applicability.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPrivacy-Preserving Technologies in Data · Ethics and Social Impacts of AI · Privacy, Security, and Data Protection
