CPTQuant - A Novel Mixed Precision Post-Training Quantization Techniques for Large Language Models
Amitash Nanda, Sree Bhargavi Balija, Debashis Sahoo

TL;DR
CPTQuant introduces innovative mixed precision post-training quantization methods that significantly reduce large language models' size and computational demands while maintaining accuracy, through correlation, pruning, and Taylor decomposition techniques.
Contribution
The paper presents CPTQuant, a comprehensive framework combining three novel mixed precision quantization strategies tailored for large language models, enhancing compression and efficiency.
Findings
Up to 4x model compression with minimal accuracy loss.
PMPQ achieves higher compression ratios than existing methods.
TDMPQ attains 30% greater compression for language modeling tasks.
Abstract
Large language models have transformed the comprehension and generation of natural language tasks, but they come with substantial memory and computational requirements. Quantization techniques have emerged as a promising avenue for addressing these challenges while preserving accuracy and making energy efficient. We propose CPTQuant, a comprehensive strategy that introduces correlation-based (CMPQ), pruning-based (PMPQ), and Taylor decomposition-based (TDMPQ) mixed precision techniques. CMPQ adapts the precision level based on canonical correlation analysis of different layers. PMPQ optimizes precision layer-wise based on their sensitivity to sparsity. TDMPQ modifies precision using Taylor decomposition to assess each layer's sensitivity to input perturbation. These strategies allocate higher precision to more sensitive layers while diminishing precision to robust layers. CPTQuant…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Linear Layer · Softmax · Linear Warmup With Linear Decay · Multi-Head Attention · WordPiece · Dropout · Dense Connections · Layer Normalization · Weight Decay
