Floe: Federated Specialization for Real-Time LLM-SLM Inference

Chunlin Tian; Kahou Tam; Yebo Wu; Shuaihang Zhong; Li Li; Nicholas D. Lane; Chengzhong Xu

arXiv:2602.14302·cs.DC·February 17, 2026

Floe: Federated Specialization for Real-Time LLM-SLM Inference

Chunlin Tian, Kahou Tam, Yebo Wu, Shuaihang Zhong, Li Li, Nicholas D. Lane, Chengzhong Xu

PDF

Open Access

TL;DR

Floe is a federated learning framework that combines cloud-based LLMs with lightweight edge models to enable real-time, privacy-preserving inference in resource-constrained environments, improving latency and personalization.

Contribution

Floe introduces a hybrid federated approach with heterogeneity-aware adaptation and logit fusion for efficient, privacy-preserving LLM inference on edge devices.

Findings

01

Reduces inference latency significantly compared to baselines.

02

Enhances user privacy and personalization.

03

Improves model performance on edge devices.

Abstract

Deploying large language models (LLMs) in real-time systems remains challenging due to their substantial computational demands and privacy concerns. We propose Floe, a hybrid federated learning framework designed for latency-sensitive, resource-constrained environments. Floe combines a cloud-based black-box LLM with lightweight small language models (SLMs) on edge devices to enable low-latency, privacy-preserving inference. Personal data and fine-tuning remain on-device, while the cloud LLM contributes general knowledge without exposing proprietary weights. A heterogeneity-aware LoRA adaptation strategy enables efficient edge deployment across diverse hardware, and a logit-level fusion mechanism enables real-time coordination between edge and cloud models. Extensive experiments demonstrate that Floe enhances user privacy and personalization. Moreover, it significantly improves model…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Big Data and Digital Economy · Machine Learning in Healthcare