Single Parent Family: A Spectrum of Family Members from a Single   Pre-Trained Foundation Model

Habib Hajimolahoseini; Mohammad Hassanpour; Foozhan Ataiefard; Boxing; Chen; Yang Liu

arXiv:2406.19995·cs.CL·July 1, 2024

Single Parent Family: A Spectrum of Family Members from a Single Pre-Trained Foundation Model

Habib Hajimolahoseini, Mohammad Hassanpour, Foozhan Ataiefard, Boxing, Chen, Yang Liu

PDF

Open Access

TL;DR

This paper presents PLRD, a novel method for compressing large language models by incrementally reducing tensor ranks, enabling efficient model scaling with minimal retraining and resource use.

Contribution

The introduction of Progressive Low Rank Decomposition (PLRD), a new technique for compressing large language models without retraining from scratch.

Findings

01

Models compressed with PLRD maintain performance with only 0.1% of original tokens.

02

PLRD enables generation of multiple model sizes from a single base model.

03

Significant reductions in computational and energy costs achieved.

Abstract

This paper introduces a novel method of Progressive Low Rank Decomposition (PLRD) tailored for the compression of large language models. Our approach leverages a pre-trained model, which is then incrementally decompressed to smaller sizes using progressively lower ranks. This method allows for significant reductions in computational overhead and energy consumption, as subsequent models are derived from the original without the need for retraining from scratch. We detail the implementation of PLRD, which strategically decreases the tensor ranks, thus optimizing the trade-off between model performance and resource usage. The efficacy of PLRD is demonstrated through extensive experiments showing that models trained with PLRD method on only 1B tokens maintain comparable performance with traditionally trained models while using 0.1% of the tokens. The versatility of PLRD is highlighted by…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsIntergenerational Family Dynamics and Caregiving

MethodsSparse Evolutionary Training