SCOPE: Semantic Coreset with Orthogonal Projection Embeddings for Federated learning
Md Anwar Hossen, Nathan R. Tallent, Luanzheng Guo, Ali Jannesary

TL;DR
SCOPE introduces a novel federated learning coreset method that filters anomalies and prunes redundant data using global consensus on scalar metrics, significantly improving efficiency and robustness in class-imbalanced, high-resolution data scenarios.
Contribution
The paper presents SCOPE, a new coreset framework that leverages orthogonal projection embeddings and global consensus to enhance data selection in federated learning, addressing class imbalance and communication efficiency.
Findings
Achieves 128x to 512x reduction in uplink bandwidth.
Demonstrates robust convergence and competitive accuracy.
Reduces FLOP and VRAM footprints for local coreset selection.
Abstract
Scientific discovery increasingly requires learning on federated datasets, fed by streams from high-resolution instruments, that have extreme class imbalance. Current ML approaches either require impractical data aggregation or fail due to class imbalance. Existing coreset selection methods rely on local heuristics, making them unaware of the global data landscape and prone to sub-optimal and non-representative pruning. To overcome these challenges, we introduce SCOPE (Semantic Coreset using Orthogonal Projection Embeddings for Federated learning), a coreset framework for federated data that filters anomalies and adaptively prunes redundant data to mitigate long-tail skew. By analyzing the latent space distribution, we score each data point using a representation score that measures the reliability of core class features, a diversity score that quantifies the novelty of orthogonal…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPrivacy-Preserving Technologies in Data · Advanced Graph Neural Networks · Domain Adaptation and Few-Shot Learning
