SHRP: Specialized Head Routing and Pruning for Efficient Encoder Compression
Zeli Su, Ziyin Zhang, Wenzheng Zhang, Zhou Liu, Guixian Xu, Wentao Zhang

TL;DR
SHRP introduces a structured pruning method for transformer encoders that significantly reduces model size and computation while maintaining high accuracy, enabling more efficient real-time NLP applications.
Contribution
The paper proposes SHRP, a novel framework that automatically prunes redundant attention heads in transformers using a dynamic routing mechanism, improving efficiency without major accuracy loss.
Findings
Achieves 93% of original accuracy with 48% parameter reduction.
Maintains 84% accuracy even when 11/12 layers are pruned.
Delivers 4.2x throughput gain with only 11.5% of original FLOPs.
Abstract
Transformer encoders are widely deployed in large-scale web services for natural language understanding tasks such as text classification, semantic retrieval, and content ranking. However, their high inference latency and memory consumption pose significant challenges for real-time serving and scalability. These limitations stem largely from architectural redundancy, particularly in the attention module. The inherent parameter redundancy of the attention mechanism, coupled with the fact that its attention heads operate with a degree of independence, makes it particularly amenable to structured model compression. In this paper, we propose SHRP (Specialized Head Routing and Pruning), a novel structured pruning framework that automatically identifies and removes redundant attention heads while preserving most of the model's accuracy and compatibility. SHRP introduces Expert Attention, a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Big Data and Digital Economy · Green IT and Sustainability
