Single Parent Family: A Spectrum of Family Members from a Single Pre-Trained Foundation Model
Habib Hajimolahoseini, Mohammad Hassanpour, Foozhan Ataiefard, Boxing, Chen, Yang Liu

TL;DR
This paper presents PLRD, a novel method for compressing large language models by incrementally reducing tensor ranks, enabling efficient model scaling with minimal retraining and resource use.
Contribution
The introduction of Progressive Low Rank Decomposition (PLRD), a new technique for compressing large language models without retraining from scratch.
Findings
Models compressed with PLRD maintain performance with only 0.1% of original tokens.
PLRD enables generation of multiple model sizes from a single base model.
Significant reductions in computational and energy costs achieved.
Abstract
This paper introduces a novel method of Progressive Low Rank Decomposition (PLRD) tailored for the compression of large language models. Our approach leverages a pre-trained model, which is then incrementally decompressed to smaller sizes using progressively lower ranks. This method allows for significant reductions in computational overhead and energy consumption, as subsequent models are derived from the original without the need for retraining from scratch. We detail the implementation of PLRD, which strategically decreases the tensor ranks, thus optimizing the trade-off between model performance and resource usage. The efficacy of PLRD is demonstrated through extensive experiments showing that models trained with PLRD method on only 1B tokens maintain comparable performance with traditionally trained models while using 0.1% of the tokens. The versatility of PLRD is highlighted by…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsIntergenerational Family Dynamics and Caregiving
MethodsSparse Evolutionary Training
