TL;DR
FRILL introduces a lightweight, efficient non-semantic speech embedding optimized for mobile devices, achieving high accuracy with minimal size and speed improvements over existing models, and enabling practical mobile health applications.
Contribution
This work presents FRILL, a novel non-semantic speech embedding model optimized for real-time mobile deployment, combining architectural innovations and speed-up techniques.
Findings
FRILL is 32x faster on Pixel 1 than TRILL.
FRILL is 40% the size of TRILL with only 2% accuracy loss.
FRILL performs well on mobile health tasks like sound and speech detection.
Abstract
Learned speech representations can drastically improve performance on tasks with limited labeled data. However, due to their size and complexity, learned representations have limited utility in mobile settings where run-time performance can be a significant bottleneck. In this work, we propose a class of lightweight non-semantic speech embedding models that run efficiently on mobile devices based on the recently proposed TRILL speech embedding. We combine novel architectural modifications with existing speed-up techniques to create embedding models that are fast enough to run in real-time on a mobile device and exhibit minimal performance degradation on a benchmark of non-semantic speech tasks. One such model (FRILL) is 32x faster on a Pixel 1 smartphone and 40% the size of TRILL, with an average decrease in accuracy of only 2%. To our knowledge, FRILL is the highest-quality…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
