Efficient Speech Command Recognition Leveraging Spiking Neural Network   and Curriculum Learning-based Knowledge Distillation

Jiaqi Wang; Liutao Yu; Liwei Huang; Chenlin Zhou; Han Zhang; Zhenxi; Song; Min Zhang; Zhengyu Ma; Zhiguo Zhang

arXiv:2412.12858·cs.LG·December 18, 2024

Efficient Speech Command Recognition Leveraging Spiking Neural Network and Curriculum Learning-based Knowledge Distillation

Jiaqi Wang, Liutao Yu, Liwei Huang, Chenlin Zhou, Han Zhang, Zhenxi, Song, Min Zhang, Zhengyu Ma, Zhiguo Zhang

PDF

Open Access

TL;DR

This paper introduces SpikeSCR, a spike-driven framework for speech recognition that, combined with curriculum learning-based knowledge distillation, reduces energy consumption by 54.8% while maintaining high accuracy on benchmark datasets.

Contribution

It presents a novel hybrid structure for efficient long-term learning in SNNs and a curriculum learning-based knowledge distillation method to balance performance and energy efficiency.

Findings

01

SpikeSCR outperforms current SOTA methods with same time steps.

02

KDCL reduces time steps by 60% and energy consumption by 54.8%.

03

Maintains comparable performance to recent SOTA results.

Abstract

The intrinsic dynamics and event-driven nature of spiking neural networks (SNNs) make them excel in processing temporal information by naturally utilizing embedded time sequences as time steps. Recent studies adopting this approach have demonstrated SNNs' effectiveness in speech command recognition, achieving high performance by employing large time steps for long time sequences. However, the large time steps lead to increased deployment burdens for edge computing applications. Thus, it is important to balance high performance and low energy consumption when detecting temporal patterns in edge devices. Our solution comprises two key components. 1). We propose a high-performance fully spike-driven framework termed SpikeSCR, characterized by a global-local hybrid structure for efficient representation learning, which exhibits long-term learning capabilities with extended time steps. 2).…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHand Gesture Recognition Systems · Speech Recognition and Synthesis

MethodsKnowledge Distillation