SparseForge: Efficient Semi-Structured LLM Sparsification via Annealing of Hessian-Guided Soft-Mask
Liu Hanzuo, Chaofan Lin, Weixuan Sun, Yulong Wang, Key, Rayying, Mingyu Gao

TL;DR
SparseForge is a post-training sparsification method for large language models that efficiently improves accuracy by optimizing sparsity masks with Hessian guidance and annealing, reducing retraining costs.
Contribution
It introduces a novel Hessian-aware soft mask annealing technique for semi-structured LLM sparsification that requires significantly fewer retraining tokens.
Findings
Achieves 57.27% zero-shot accuracy on LLaMA-2-7B at 2:4 sparsity with only 5B retraining tokens.
Surpasses dense model accuracy (56.43%) and approaches state-of-the-art sparsification methods using 40B tokens.
Demonstrates consistent accuracy-efficiency improvements across different model families.
Abstract
Semi-structured sparsity provides a practical path to accelerate large language models (LLMs) with native hardware support, but post-training semi-structured pruning often suffers from substantial quality degradation due to strong structural coupling. Existing methods rely on large-scale sparse retraining to recover accuracy, resulting in high computational cost. We propose SparseForge, a post-training framework that improves recovery efficiency by directly optimizing the sparsity mask rather than scaling up retraining tokens. SparseForge combines Hessian-aware importance estimation with progressive annealing of soft masks into hardware-executable structured sparsity, enabling stable and efficient sparse recovery. On LLaMA-2-7B under 2:4 sparsity, SparseForge achieves 57.27% average zero-shot accuracy with only retraining tokens, surpassing the dense model's 56.43%…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
