Sirius: Contextual Sparsity with Correction for Efficient LLMs
Yang Zhou, Zhuoming Chen, Zhaozhuo Xu, Victoria Lin, Beidi Chen

TL;DR
Sirius is a correction mechanism that improves the reasoning performance of Contextual Sparsity compressed large language models, achieving significant efficiency gains and maintaining high accuracy on complex tasks.
Contribution
This paper introduces Sirius, a novel correction method that recovers the reasoning accuracy of CS-based LLMs while preserving their inference efficiency.
Findings
Sirius significantly improves CS model performance on reasoning tasks.
Sirius reduces latency by approximately 20% for 8B models and 35% for 70B models.
Sirius maintains efficiency gains while enhancing model accuracy.
Abstract
With the blossom of large language models (LLMs), inference efficiency becomes increasingly important. Various approximation methods are proposed to reduce the cost at inference time. Contextual Sparsity (CS) is appealing for its training-free nature and its ability to reach a higher compression ratio seemingly without quality degradation. However, after a comprehensive evaluation of contextual sparsity methods on various complex generation tasks, we find that although CS succeeds in prompt-understanding tasks, CS significantly degrades the model performance for reasoning, deduction, and knowledge-based tasks. Despite the gap in end-to-end accuracy, we observed that sparse models often share general problem-solving logic and require only a few token corrections to recover the original model performance. This paper introduces Sirius, an efficient correction mechanism, which significantly…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Data Storage Technologies · Algorithms and Data Compression · Cryptography and Data Security
