ALPS: Attention Localization and Pruning Strategy for Efficient Alignment of Large Language Models

Hao Chen; Haoze Li; Zhiqing Xiao; Lirong Gao; Qi Zhang; Xiaomeng Hu; Ningtao Wang; Xing Fu; Junbo Zhao

arXiv:2505.18799·cs.CL·June 19, 2025

ALPS: Attention Localization and Pruning Strategy for Efficient Alignment of Large Language Models

Hao Chen, Haoze Li, Zhiqing Xiao, Lirong Gao, Qi Zhang, Xiaomeng Hu, Ningtao Wang, Xing Fu, Junbo Zhao

PDF

TL;DR

ALPS introduces a strategy to efficiently align large language models by localizing and pruning attention heads, reducing training costs while improving performance and transferability across tasks.

Contribution

The paper proposes ALPS, a novel algorithm that localizes task-sensitive attention heads and prunes others, enhancing alignment efficiency without relying on data-dependent methods.

Findings

01

Activates only 10% of attention parameters during fine-tuning.

02

Achieves a 2% performance improvement over baselines.

03

Heads identified are transferable across datasets and reduce knowledge forgetting.

Abstract

Aligning general-purpose large language models (LLMs) to downstream tasks often incurs significant training adjustment costs. Prior research has explored various avenues to enhance alignment efficiency, primarily through minimal-data training or data-driven activations to identify key attention heads. However, these approaches inherently introduce data dependency, which hinders generalization and reusability. To address this issue and enhance model alignment efficiency, we propose the Attention Localization and Pruning Strategy (ALPS), an efficient algorithm that localizes the most task-sensitive attention heads and prunes by restricting attention training updates to these heads, thereby reducing alignment costs. Experimental results demonstrate that our method activates only 10% of attention parameters during fine-tuning while achieving a 2% performance improvement over baselines on…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsSoftmax · Attention Is All You Need