Measuring Heterogeneity in Machine Learning with Distributed Energy Distance
Mengchen Fan, Baocheng Geng, Roman Shterenberg, Joseph A. Casey, Zhong, Chen, and Keren Li

TL;DR
This paper introduces energy distance as a sensitive measure for quantifying feature heterogeneity in distributed learning, develops efficient approximations, and applies it to improve model convergence and coordination across heterogeneous data sources.
Contribution
It presents a novel application of energy distance in distributed learning, including scalable approximations and a method for weighting to improve model aggregation.
Findings
Energy distance effectively detects data distribution shifts.
Taylor approximations reduce computational costs significantly.
Using energy distance-based weights improves convergence in distributed systems.
Abstract
In distributed and federated learning, heterogeneity across data sources remains a major obstacle to effective model aggregation and convergence. We focus on feature heterogeneity and introduce energy distance as a sensitive measure for quantifying distributional discrepancies. While we show that energy distance is robust for detecting data distribution shifts, its direct use in large-scale systems can be prohibitively expensive. To address this, we develop Taylor approximations that preserve key theoretical quantitative properties while reducing computational overhead. Through simulation studies, we show how accurately capturing feature discrepancies boosts convergence in distributed learning. Finally, we propose a novel application of energy distance to assign penalty weights for aligning predictions across heterogeneous nodes, ultimately enhancing coordination in federated and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications
MethodsFocus
