CoLA: Compute-Efficient Pre-Training of LLMs via Low-Rank Activation
Ziyue Liu, Ruijie Zhang, Zhengyang Wang, Mingsong Yan, Zi Yang, Paul Hovland, Bogdan Nicolae, Franck Cappello, Sui Tang, Zheng Zhang

TL;DR
This paper introduces CoLA, a low-rank activation-based method for pre-training large language models more efficiently, reducing computational costs and memory usage while maintaining performance.
Contribution
It proposes a novel architecture replacing full-size layers with auto-encoders enforcing low-rank activations, significantly improving training efficiency and model size.
Findings
CoLA halves the computing cost of LLM pre-training.
CoLA improves training throughput by 86%.
Produced models are twice as small, enabling faster inference.
Abstract
The full-size MLPs and the projection layers in attention introduce tremendous model sizes of large language models (LLMs), consuming extensive computational resources in pre-training. We empirically observe that the activations of pre-trained LLMs exhibit low-rank property. Motivated by such observations, we propose CoLA and its memory-efficient implementation, CoLA-M, to replace these full-size layers with compute-efficient auto-encoders that naturally enforce low-rank activations throughout training. This fundamental architectural change eliminates the activation redundancy and significantly boosts model capacity and training efficiency. Experiments on LLaMA models with 60 million to 7 billion parameters show that CoLA reduces the computing cost by and improves training throughput by while maintaining full-rank level performance. CoLA-M…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsAdvanced Radiotherapy Techniques · Brain Tumor Detection and Classification · Medical Imaging Techniques and Applications
MethodsSoftmax · Attention Is All You Need · LLaMA · COLA
