When Less is More: 8-bit Quantization Improves Continual Learning in Large Language Models
Michael S. Zhang, Rishi A. Ruia, Arnav Kewalram, Saathvik Dharmapuram, Utkarsh Sharma, Kevin Zhu

TL;DR
This paper demonstrates that 8-bit quantization enhances continual learning in large language models by improving knowledge retention and task performance, challenging the notion that higher precision always yields better results.
Contribution
The study reveals that INT8 quantization, combined with minimal replay buffers, outperforms higher precision models in continual learning, providing new insights into model regularization and deployment efficiency.
Findings
INT8 models outperform FP16 on final task accuracy
Minimal replay buffers significantly improve retention across precisions
Quantization acts as implicit regularization, aiding continual learning
Abstract
Catastrophic forgetting poses a fundamental challenge in continual learning, particularly when models are quantized for deployment efficiency. We systematically investigate the interplay between quantization precision (FP16, INT8, INT4) and replay buffer strategies in large language models, revealing unexpected dynamics. While FP16 achieves superior initial task performance (74.44% on NLU), we observe a striking inversion on subsequent tasks: quantized models outperform FP16 by 8-15% on final task forward accuracy, with INT4 achieving nearly double FP16's performance on Code generation (40% vs 20%). Critically, even minimal replay buffers (0.1%) dramatically improve retention - increasing NLU retention after Math training from 45% to 65% across all precision levels - with INT8 consistently achieving the optimal balance between learning plasticity and knowledge retention. We hypothesize…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Advanced Neural Network Applications · Generative Adversarial Networks and Image Synthesis
