Exploring Extreme Quantization in Spiking Language Models

Malyaban Bal; Yi Jiang; Abhronil Sengupta

arXiv:2405.02543·cs.NE·July 2, 2024

Exploring Extreme Quantization in Spiking Language Models

Malyaban Bal, Yi Jiang, Abhronil Sengupta

PDF

Open Access

TL;DR

This paper introduces a novel ultra-quantized spiking language model architecture that significantly reduces energy consumption while maintaining performance, using knowledge distillation from full-precision models.

Contribution

It develops the first 1/1.58-bit spiking language model using knowledge distillation, advancing energy-efficient NLP models with scalable architecture.

Findings

01

Achieves competitive performance on GLUE benchmark tasks.

02

Demonstrates effective knowledge transfer from full-precision models.

03

Presents a scalable, ultra-quantized spiking LM architecture.

Abstract

Despite the growing prevalence of large language model (LLM) architectures, a crucial concern persists regarding their energy and power consumption, which still lags far behind the remarkable energy efficiency of the human brain. Recent strides in spiking language models (LM) and transformer architectures aim to address this concern by harnessing the spiking activity of biological neurons to enhance energy/power efficiency. Doubling down on the principles of model quantization and energy efficiency, this paper proposes the development of a novel binary/ternary (1/1.58-bit) spiking LM architecture. Achieving scalability comparable to a deep spiking LM architecture is facilitated by an efficient knowledge distillation technique, wherein knowledge from a non-spiking full-precision "teacher" model is transferred to an extremely weight quantized spiking "student" LM. Our proposed model…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsFerroelectric and Negative Capacitance Devices · DNA and Biological Computing · Neural Networks and Applications

MethodsKnowledge Distillation