SCALE: Upscaled Continual Learning of Large Language Models

Jin-woo Lee; Junhwa Choi; Bongkyu Hwang; Jinho Choo; Bogun Kim; JeongSeon Yi; Joonseok Lee; DongYoung Jung; Jaeseon Park; Kyoungwon Park; Suk-hoon Jung

arXiv:2511.03270·cs.CL·December 12, 2025

SCALE: Upscaled Continual Learning of Large Language Models

Jin-woo Lee, Junhwa Choi, Bongkyu Hwang, Jinho Choo, Bogun Kim, JeongSeon Yi, Joonseok Lee, DongYoung Jung, Jaeseon Park, Kyoungwon Park, Suk-hoon Jung

PDF

Open Access 1 Video

TL;DR

SCALE introduces a width upscaling architecture for large language models that preserves pre-trained knowledge while efficiently acquiring new information through selective expansion and training strategies.

Contribution

The paper proposes a novel width upscaling method, SCALE, that maintains model stability during continual learning by combining preservation and adaptation principles with lightweight expansions.

Findings

01

SCALE reduces forgetting in synthetic and real-world benchmarks.

02

SCALE achieves competitive performance on Korean language tasks.

03

The approach stabilizes optimization compared to standard continual learning methods.

Abstract

We revisit continual pre-training for large language models and argue that progress now depends more on scaling the right structure than on scaling parameters alone. We introduce SCALE, a width upscaling architecture that inserts lightweight expansion into linear modules while freezing all pre-trained parameters. This preserves the residual and attention topologies and increases capacity without perturbing the base model's original functionality. SCALE is guided by two principles: Persistent Preservation, which maintains the base model's behavior via preservation-oriented initialization and freezing of the pre-trained weights, and Collaborative Adaptation, which selectively trains a subset of expansion components to acquire new knowledge with minimal interference. We instantiate these ideas as SCALE-Preserve (preservation-first), SCALE-Adapt (adaptation-first), and SCALE-Route, an…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

SCALE: Upscaled Continual Learning of Large Language Models· underline

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Topic Modeling · Generative Adversarial Networks and Image Synthesis