Scalable MatMul-free Language Modeling
Rui-Jie Zhu, Yu Zhang, Steven Abreu, Ethan Sifferman, Tyler Sheaves, Yiqiao Wang, Dustin Richmond, Sumit Bam Shrestha, Peng Zhou, Jason K. Eshraghian

TL;DR
This paper introduces a novel approach to eliminate matrix multiplication in large language models, significantly reducing memory and energy consumption while maintaining competitive performance at billion-parameter scales.
Contribution
The authors present a scalable MatMul-free architecture for LLMs that preserves performance and drastically reduces memory and energy requirements compared to traditional models.
Findings
Memory savings of up to 61% during training
Over 10x reduction in inference memory usage
Achieves 4x higher throughput and 10x less energy on neuromorphic hardware
Abstract
Large Language Models (LLMs) have fundamentally altered how we approach scaling in machine learning. However, these models pose substantial computational and memory challenges, primarily due to the reliance on matrix multiplication (MatMul) within their attention and feed-forward (FFN) layers. We demonstrate that MatMul operations can be eliminated from LLMs while maintaining strong performance, even at billion-parameter scales. Our MatMul-free models, tested on models up to 2.7B parameters, are comparable to state-of-the-art pre-trained Transformers, and the performance gap narrows as model size increases. Our approach yields significant memory savings: a GPU-efficient implementation reduces memory consumption by up to 61% during training and over 10x during inference. When adapted for a multi-chip neuromorphic system, the model leverages asynchronous processing to achieve 4x higher…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Scalable MatMul-free Language Modeling (Paper Explained)· youtube
Taxonomy
TopicsModel-Driven Software Engineering Techniques
