Speech Understanding on Tiny Devices with A Learning Cache

Afsara Benazir; Zhiming Xu; Felix Xiaozhu Lin (University of Virginia)

arXiv:2311.18188·eess.AS·May 9, 2024·1 cites

Speech Understanding on Tiny Devices with A Learning Cache

Afsara Benazir, Zhiming Xu, Felix Xiaozhu Lin (University of Virginia)

PDF

Open Access 1 Repo

TL;DR

This paper introduces SpeechCache, a novel on-device speech understanding system for tiny devices that leverages temporal locality and caching to reduce cloud offloading, improve latency, and maintain accuracy in challenging environments.

Contribution

It presents a new speech cache system that matches speech inputs at multiple levels and learns to personalize, enabling efficient on-device SLU on microcontrollers.

Findings

01

Resolves 45%-90% of inputs on device

02

Reduces latency by up to 80%

03

Effective even in noisy or adversarial conditions

Abstract

This paper addresses spoken language understanding (SLU) on microcontroller-like embedded devices, integrating on-device execution with cloud offloading in a novel fashion. We leverage temporal locality in the speech inputs to a device and reuse recent SLU inferences accordingly. Our idea is simple: let the device match incoming inputs against cached results, and only offload inputs not matched to any cached ones to the cloud for full inference. Realization of this idea, however, is non-trivial: the device needs to compare acoustic features in a robust yet low-cost way. To this end, we present SpeechCache (or SC), a speech cache for tiny devices. It matches speech inputs at two levels of representations: first by sequences of clustered raw sound units, then as sequences of phonemes. Working in tandem, the two representations offer complementary tradeoffs between cost and efficiency. To…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

afsara-ben/speechcache
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Music and Audio Processing · Topic Modeling