AngelSlim: A more accessible, comprehensive, and efficient toolkit for large model compression
Rui Cen, QiangQiang Hu, Hong Huang, Hong Liu, Song Liu, Xin Luo, Lin Niu, Yifan Tan, Decheng Wu, Linchuan Xie, Rubing Yang, Guanghua Yu, Jianchen Zhu (Hunyuan AI Infra Team)

TL;DR
AngelSlim is a versatile toolkit that combines multiple advanced model compression techniques, including quantization, pruning, and speculative decoding, to enable efficient deployment of large models across various modalities.
Contribution
The paper introduces AngelSlim, a unified toolkit integrating state-of-the-art algorithms for large model compression, including novel methods like ultra-low-bit quantization and multimodal pruning strategies.
Findings
Achieved 1.8x to 2.0x throughput gains with speculative decoding.
Developed a training-free sparse attention framework for long-context scenarios.
Enabled 2-bit large models with HY-1.8B-int2 for industrial deployment.
Abstract
This technical report introduces AngelSlim, a comprehensive and versatile toolkit for large model compression developed by the Tencent Hunyuan team. By consolidating cutting-edge algorithms, including quantization, speculative decoding, token pruning, and distillation. AngelSlim provides a unified pipeline that streamlines the transition from model compression to industrial-scale deployment. To facilitate efficient acceleration, we integrate state-of-the-art FP8 and INT8 Post-Training Quantization (PTQ) algorithms alongside pioneering research in ultra-low-bit regimes, featuring HY-1.8B-int2 as the first industrially viable 2-bit large model. Beyond quantization, we propose a training-aligned speculative decoding framework compatible with multimodal architectures and modern inference engines, achieving 1.8x to 2.0x throughput gains without compromising output correctness. Furthermore,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Generative Adversarial Networks and Image Synthesis · Advanced Data Compression Techniques
