SpikeBERT: A Language Spikformer Learned from BERT with Knowledge   Distillation

Changze Lv; Tianlong Li; Jianhan Xu; Chenxi Gu; Zixuan Ling; Cenyuan; Zhang; Xiaoqing Zheng; Xuanjing Huang

arXiv:2308.15122·cs.CL·February 22, 2024·2 cites

SpikeBERT: A Language Spikformer Learned from BERT with Knowledge Distillation

Changze Lv, Tianlong Li, Jianhan Xu, Chenxi Gu, Zixuan Ling, Cenyuan, Zhang, Xiaoqing Zheng, Xuanjing Huang

PDF

Open Access 1 Repo

TL;DR

SpikeBERT introduces a deep spiking Transformer for language tasks, trained via a two-stage knowledge distillation from BERT, achieving competitive accuracy with lower energy consumption.

Contribution

The paper develops SpikeBERT, a novel deep spiking Transformer model for language understanding, trained with a two-stage knowledge distillation from BERT, enabling efficient and effective language processing.

Findings

01

SpikeBERT outperforms existing SNNs on text classification.

02

SpikeBERT achieves comparable results to BERT with less energy.

03

Two-stage knowledge distillation improves SNN training for language tasks.

Abstract

Spiking neural networks (SNNs) offer a promising avenue to implement deep neural networks in a more energy-efficient way. However, the network architectures of existing SNNs for language tasks are still simplistic and relatively shallow, and deep architectures have not been fully explored, resulting in a significant performance gap compared to mainstream transformer-based networks such as BERT. To this end, we improve a recently-proposed spiking Transformer (i.e., Spikformer) to make it possible to process language tasks and propose a two-stage knowledge distillation method for training it, which combines pre-training by distilling knowledge from BERT with a large collection of unlabelled texts and fine-tuning with task-specific instances via knowledge distillation again from the BERT fine-tuned on the same training examples. Through extensive experimentation, we show that the models…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Lvchangze/SpikeBERT
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Memory and Neural Computing · Ferroelectric and Negative Capacitance Devices · Robotics and Automated Systems

MethodsAttention Is All You Need · Position-Wise Feed-Forward Layer · Byte Pair Encoding · Label Smoothing · Absolute Position Encodings · Transformer · Linear Layer · Layer Normalization · Refunds@Expedia|||How do I get a full refund from Expedia? · Dropout