HAFLQ: Heterogeneous Adaptive Federated LoRA Fine-tuned LLM with Quantization
Yang Su, Na Yan, Yansha Deng, Mischa Dohler, and Robert Schober

TL;DR
HAFLQ is a comprehensive federated fine-tuning framework for LLMs that employs adaptive quantization, importance-based parameter management, and efficient aggregation to reduce resource usage and improve accuracy in heterogeneous environments.
Contribution
It introduces novel adaptive quantization, parameter truncation, bandwidth-aware quantization, and matrix aggregation strategies for scalable federated LLM fine-tuning.
Findings
Reduces memory usage by 31%
Lowers communication cost by 49%
Improves accuracy by 50%
Abstract
Federated fine-tuning of pre-trained Large Language Models (LLMs) enables task-specific adaptation across diverse datasets while preserving privacy. However, challenges such as high computational and memory demands, heterogeneous client resources, bandwidth constraints, and ineffective global aggregation hinder its efficiency. To address these issues, we propose HAFLQ (Heterogeneous Adaptive Federated Low-Rank Adaptation Fine-tuned LLM with Quantization), a novel framework for efficient and scalable federated fine-tuning of LLMs in heterogeneous environments. To reduce memory and computation demands, we propose a salience-driven adaptive LLM quantization framework that evaluates the importance of transformer blocks using a salience metric and applies adaptive block-wise quantization accordingly. To handle heterogeneous computational capabilities, we propose an importance-based parameter…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsIoT and Edge/Fog Computing · Energy Efficient Wireless Sensor Networks · Wireless Sensor Networks for Data Analysis
