Explore Activation Sparsity in Recurrent LLMs for Energy-Efficient   Neuromorphic Computing

Ivan Knunyants; Maryam Tavakol; Manolis Sifalakis; Yingfu Xu; Amirreza; Yousefzadeh; Guangzhi Tang

arXiv:2501.16337·cs.NE·January 29, 2025·2 cites

Explore Activation Sparsity in Recurrent LLMs for Energy-Efficient Neuromorphic Computing

Ivan Knunyants, Maryam Tavakol, Manolis Sifalakis, Yingfu Xu, Amirreza, Yousefzadeh, Guangzhi Tang

PDF

Open Access 1 Repo

TL;DR

This paper introduces a training-free method to sparsify activations in recurrent large language models, significantly reducing energy consumption and latency on neuromorphic hardware while maintaining accuracy.

Contribution

It presents a novel, low-cost activation sparsification algorithm for R-LLMs that enhances energy efficiency without additional training, applicable to various LLM architectures.

Findings

01

Significant reduction in computational demands.

02

Notable energy savings and latency improvements on neuromorphic hardware.

03

Maintains competitive accuracy across benchmarks.

Abstract

The recent rise of Large Language Models (LLMs) has revolutionized the deep learning field. However, the desire to deploy LLMs on edge devices introduces energy efficiency and latency challenges. Recurrent LLM (R-LLM) architectures have proven effective in mitigating the quadratic complexity of self-attention, making them a potential paradigm for computing on-edge neuromorphic processors. In this work, we propose a low-cost, training-free algorithm to sparsify R-LLMs' activations to enhance energy efficiency on neuromorphic hardware. Our approach capitalizes on the inherent structure of these models, rendering them well-suited for energy-constrained environments. Although primarily designed for R-LLMs, this method can be generalized to other LLM architectures, such as transformers, as demonstrated on the OPT model, achieving comparable sparsity and efficiency improvements. Empirical…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ernis-lab/llm-activation-sparsity
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Memory and Neural Computing · Ferroelectric and Negative Capacitance Devices · Neural Networks and Reservoir Computing

MethodsOPT