Speech Understanding on Tiny Devices with A Learning Cache
Afsara Benazir, Zhiming Xu, Felix Xiaozhu Lin (University of Virginia)

TL;DR
This paper introduces SpeechCache, a novel on-device speech understanding system for tiny devices that leverages temporal locality and caching to reduce cloud offloading, improve latency, and maintain accuracy in challenging environments.
Contribution
It presents a new speech cache system that matches speech inputs at multiple levels and learns to personalize, enabling efficient on-device SLU on microcontrollers.
Findings
Resolves 45%-90% of inputs on device
Reduces latency by up to 80%
Effective even in noisy or adversarial conditions
Abstract
This paper addresses spoken language understanding (SLU) on microcontroller-like embedded devices, integrating on-device execution with cloud offloading in a novel fashion. We leverage temporal locality in the speech inputs to a device and reuse recent SLU inferences accordingly. Our idea is simple: let the device match incoming inputs against cached results, and only offload inputs not matched to any cached ones to the cloud for full inference. Realization of this idea, however, is non-trivial: the device needs to compare acoustic features in a robust yet low-cost way. To this end, we present SpeechCache (or SC), a speech cache for tiny devices. It matches speech inputs at two levels of representations: first by sequences of clustered raw sound units, then as sequences of phonemes. Working in tandem, the two representations offer complementary tradeoffs between cost and efficiency. To…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Music and Audio Processing · Topic Modeling
