Greedy-Gnorm: A Gradient Matrix Norm-Based Alternative to Attention Entropy for Head Pruning
Yuxi Guo, Paul Sheridan

TL;DR
Greedy-Gnorm introduces a dynamic, gradient-based head pruning method for transformers that adaptively recalculates head importance, leading to more efficient model compression while preserving accuracy.
Contribution
It proposes a novel gradient matrix norm-based head importance scoring method that updates importance after each pruning step, improving over static importance measures.
Findings
Outperforms attention entropy in head pruning effectiveness.
Maintains high accuracy with substantial head removal.
Reduces model size while preserving task performance.
Abstract
Attention head pruning has emerged as an effective technique for transformer model compression, an increasingly important goal in the era of Green AI. However, existing pruning methods often rely on static importance scores, which fail to capture the evolving role of attention heads during iterative removal. We propose Greedy-Gradient norm (Greedy-Gnorm), a novel head pruning algorithm that dynamically recalculates head importance after each pruning step. Specifically, each head is scored by the elementwise product of the l2-norms of its Q/K/V gradient blocks, as estimated from a hold-out validation set and updated at every greedy iteration. This dynamic approach to scoring mitigates against stale rankings and better reflects gradient-informed importance as pruning progresses. Extensive experiments on BERT, ALBERT, RoBERTa, and XLM-RoBERTa demonstrate that Greedy-Gnorm consistently…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Image Enhancement Techniques · Generative Adversarial Networks and Image Synthesis
