TL;DR
This paper introduces an inference load-aware orchestration scheme for hierarchical federated learning that optimizes model placement and device association to reduce inference latency and communication costs in continual learning scenarios.
Contribution
It presents a novel joint orchestration approach that considers inference workloads and processing capacities to improve efficiency in hierarchical federated learning.
Findings
Significant inference latency reduction achieved.
Communication costs are drastically reduced.
Optimized aggregator placement improves performance.
Abstract
Hierarchical federated learning (HFL) designs introduce intermediate aggregator nodes between clients and the global federated learning server in order to reduce communication costs and distribute server load. One side effect is that machine learning model replication at scale comes "for free" as part of the HFL process: model replicas are hosted at the client end, intermediate nodes, and the global server level and are readily available for serving inference requests. This creates opportunities for efficient model serving but simultaneously couples the training and serving processes and calls for their joint orchestration. This is particularly important for continual learning, where serving a model while (re)training it periodically, upon specific triggers, or continuously, takes place over shared infrastructure spanning the computing continuum. Consequently, training and inference…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
