EfficientXpert: Efficient Domain Adaptation for Large Language Models via Propagation-Aware Pruning
Songlin Zhao, Michael Pitts, Zhuwei Qin

TL;DR
EfficientXpert is a novel, resource-efficient framework that enables domain-specific adaptation of large language models through pruning, achieving high performance with minimal additional computation.
Contribution
It introduces a propagation-aware pruning method and a low-rank adapter update, allowing effective domain adaptation with comparable fine-tuning costs to LoRA.
Findings
Achieves up to 98% of dense model performance at 40% sparsity.
Outperforms prior pruning methods on health and legal benchmarks.
Maintains training time and GPU memory within 1% of LoRA.
Abstract
Large language models (LLMs) are increasingly adapted into domain-specific variants for applications in law, healthcare, and finance. Their scale, however, limits deployment in resource-constrained settings, and existing compression approaches often either degrade after domain adaptation or require substantial additional computation. We introduce EfficientXpert, a lightweight framework for domain pruning that integrates ForeSight Mask, a propagation-aware criterion for selecting weights to prune without backpropagation, and Partial Brain Surgeon, an efficient closed-form update for low-rank adapters under a fixed sparsity pattern. With fine-tuning cost comparable to standard LoRA, EfficientXpert converts a general pretrained model into a sparse, domain-adapted expert in a single pruning step. Across health and legal benchmarks, EfficientXpert reaches up to 98 percent of dense…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Artificial Intelligence in Healthcare and Education · Topic Modeling
