Federated Language Models Under Bandwidth Budgets: Distillation Rates and Conformal Coverage
Prasanjit Dubey, Xiaoming Huo

TL;DR
This paper analyzes the statistical guarantees of federated language models under bandwidth constraints, focusing on distillation and conformal coverage, with theoretical bounds and synthetic experiments.
Contribution
It introduces new theoretical bounds for federated language models considering bandwidth as a key parameter, unifying training and inference guarantees.
Findings
Bandwidth affects model consistency only through an exponentially vanishing term.
Per-node retrieval bandwidth influences coverage slack, decreasing as the number of nodes increases.
Synthetic experiments confirm the theoretical scaling laws.
Abstract
Training a language model on data scattered across bandwidth-limited nodes that cannot be centralized is a setting that arises in clinical networks, enterprise knowledge bases, and scientific consortia. We study the regime in which data must remain distributed across nodes, and ask what statistical guarantees are in principle achievable under explicit bandwidth budgets; we aim to characterize what is provably possible, not to demonstrate a deployment-ready system. Existing theory treats either training-time consistency or inference-time calibration in isolation, and none makes bandwidth a first-class statistical parameter. We analyze two protocols, Federated Probe-Logit Distillation (FPLD) for training and Federated Conformal RAG (FC-RAG) for inference, as the analytical vehicles for our results. Our first main result is an explicit high-probability KL-consistency rate for FPLD with…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
