FED-FSTQ: Fisher-Guided Token Quantization for Communication-Efficient Federated Fine-Tuning of LLMs on Edge Devices

Changyu Li; Shuanghong Huang; Jiashen Liu; Ming Lei; Jidu Xing; Kaishun Wu; Lu Wang; and Fei Luo

arXiv:2604.25421·cs.LG·April 29, 2026

FED-FSTQ: Fisher-Guided Token Quantization for Communication-Efficient Federated Fine-Tuning of LLMs on Edge Devices

Changyu Li, Shuanghong Huang, Jiashen Liu, Ming Lei, Jidu Xing, Kaishun Wu, Lu Wang, and Fei Luo

PDF

TL;DR

Fed-FSTQ introduces a Fisher-guided token quantization method that significantly reduces communication overhead and accelerates federated fine-tuning of large language models on edge devices.

Contribution

It proposes a model-agnostic, importance-aware token selection and mixed-precision quantization scheme for efficient federated LLM fine-tuning under bandwidth constraints.

Findings

01

Reduces uplink traffic by 46x to reach a fixed quality threshold.

02

Improves wall-clock time-to-accuracy by 52% in federated settings.

03

Enables faster inference speedup of 1.55x on edge devices.

Abstract

Federated fine-tuning provides a practical route to adapt large language models (LLMs) on edge devices without centralizing private data, yet in mobile deployments the training wall-clock is often bottlenecked by straggler-limited uplink communication under heterogeneous bandwidth and intermittent participation. Although parameter-efficient fine-tuning (PEFT) reduces trainable parameters, per-round payloads remain prohibitive in non-IID regimes, where uniform compression can discard rare but task-critical signals. We propose Fed-FSTQ, a Fisher-guided token quantization system primitive for communication-efficient federated LLM fine-tuning. Fed-FSTQ employs a lightweight Fisher proxy to estimate token sensitivity, coupling importance-aware token selection with non-uniform mixed-precision quantization to allocate higher fidelity to informative evidence while suppressing redundant…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.