When Less is More: 8-bit Quantization Improves Continual Learning in Large Language Models

Michael S. Zhang; Rishi A. Ruia; Arnav Kewalram; Saathvik Dharmapuram; Utkarsh Sharma; Kevin Zhu

arXiv:2512.18934·cs.LG·December 23, 2025

When Less is More: 8-bit Quantization Improves Continual Learning in Large Language Models

Michael S. Zhang, Rishi A. Ruia, Arnav Kewalram, Saathvik Dharmapuram, Utkarsh Sharma, Kevin Zhu

PDF

Open Access

TL;DR

This paper demonstrates that 8-bit quantization enhances continual learning in large language models by improving knowledge retention and task performance, challenging the notion that higher precision always yields better results.

Contribution

The study reveals that INT8 quantization, combined with minimal replay buffers, outperforms higher precision models in continual learning, providing new insights into model regularization and deployment efficiency.

Findings

01

INT8 models outperform FP16 on final task accuracy

02

Minimal replay buffers significantly improve retention across precisions

03

Quantization acts as implicit regularization, aiding continual learning

Abstract

Catastrophic forgetting poses a fundamental challenge in continual learning, particularly when models are quantized for deployment efficiency. We systematically investigate the interplay between quantization precision (FP16, INT8, INT4) and replay buffer strategies in large language models, revealing unexpected dynamics. While FP16 achieves superior initial task performance (74.44% on NLU), we observe a striking inversion on subsequent tasks: quantized models outperform FP16 by 8-15% on final task forward accuracy, with INT4 achieving nearly double FP16's performance on Code generation (40% vs 20%). Critically, even minimal replay buffers (0.1%) dramatically improve retention - increasing NLU retention after Math training from 45% to 65% across all precision levels - with INT8 consistently achieving the optimal balance between learning plasticity and knowledge retention. We hypothesize…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Advanced Neural Network Applications · Generative Adversarial Networks and Image Synthesis