A Universal Metric of Dataset Similarity for Cross-silo Federated Learning

Ahmed Elhussein; Gamze Gursoy

arXiv:2404.18773·cs.LG·October 8, 2025

A Universal Metric of Dataset Similarity for Cross-silo Federated Learning

Ahmed Elhussein, Gamze Gursoy

PDF

Open Access 1 Repo

TL;DR

This paper introduces a novel, privacy-preserving, dataset-agnostic metric for measuring dataset similarity in federated learning, which correlates well with model performance and aids cross-site collaboration.

Contribution

The paper presents the first federated dataset similarity metric that is dataset-agnostic, privacy-preserving, and computationally efficient, addressing limitations of existing metrics.

Findings

01

The metric correlates robustly with model performance in FL.

02

It can be computed without data sharing or model training.

03

The metric is effective across synthetic, benchmark, and medical datasets.

Abstract

Federated Learning is increasingly used in domains such as healthcare to facilitate collaborative model training without data-sharing. However, datasets located in different sites are often non-identically distributed, leading to degradation of model performance in FL. Most existing methods for assessing these distribution shifts are limited by being dataset or task-specific. Moreover, these metrics can only be calculated by exchanging data, a practice restricted in many FL scenarios. To address these challenges, we propose a novel metric for assessing dataset similarity. Our metric exhibits several desirable properties for FL: it is dataset-agnostic, is calculated in a privacy-preserving manner, and is computationally efficient, requiring no model training. In this paper, we first establish a theoretical connection between our metric and training dynamics in FL. Next, we extensively…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

annoymous-submissions/ot_cost
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsFace and Expression Recognition · Advanced Graph Neural Networks · Brain Tumor Detection and Classification