CoLA: Compute-Efficient Pre-Training of LLMs via Low-Rank Activation

Ziyue Liu; Ruijie Zhang; Zhengyang Wang; Mingsong Yan; Zi Yang; Paul Hovland; Bogdan Nicolae; Franck Cappello; Sui Tang; Zheng Zhang

arXiv:2502.10940·cs.LG·October 3, 2025

CoLA: Compute-Efficient Pre-Training of LLMs via Low-Rank Activation

Ziyue Liu, Ruijie Zhang, Zhengyang Wang, Mingsong Yan, Zi Yang, Paul Hovland, Bogdan Nicolae, Franck Cappello, Sui Tang, Zheng Zhang

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces CoLA, a low-rank activation-based method for pre-training large language models more efficiently, reducing computational costs and memory usage while maintaining performance.

Contribution

It proposes a novel architecture replacing full-size layers with auto-encoders enforcing low-rank activations, significantly improving training efficiency and model size.

Findings

01

CoLA halves the computing cost of LLM pre-training.

02

CoLA improves training throughput by 86%.

03

Produced models are twice as small, enabling faster inference.

Abstract

The full-size MLPs and the projection layers in attention introduce tremendous model sizes of large language models (LLMs), consuming extensive computational resources in pre-training. We empirically observe that the activations of pre-trained LLMs exhibit low-rank property. Motivated by such observations, we propose CoLA and its memory-efficient implementation, CoLA-M, to replace these full-size layers with compute-efficient auto-encoders that naturally enforce low-rank activations throughout training. This fundamental architectural change eliminates the activation redundancy and significantly boosts model capacity and training efficiency. Experiments on LLaMA models with 60 million to 7 billion parameters show that CoLA reduces the computing cost by $2 \times$ and improves training throughput by $1.86 \times$ while maintaining full-rank level performance. CoLA-M…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

alvin-zyl/cola
pytorchOfficial

Videos

CoLA: Compute-Efficient Pre-Training of LLMs via Low-Rank Activation· underline

Taxonomy

TopicsAdvanced Radiotherapy Techniques · Brain Tumor Detection and Classification · Medical Imaging Techniques and Applications

MethodsSoftmax · Attention Is All You Need · LLaMA · COLA