Visual Wake Words Dataset
Aakanksha Chowdhery, Pete Warden, Jonathon Shlens, Andrew Howard,, Rocky Rhodes

TL;DR
This paper introduces the Visual Wake Words dataset to facilitate the development of tiny, memory-efficient computer vision models suitable for deployment on microcontrollers in IoT applications.
Contribution
It provides a new dataset and benchmark for training and evaluating small vision models within strict memory constraints, advancing edge AI research.
Findings
State-of-the-art mobile models achieve 85-90% accuracy within 250 KB memory.
The dataset enables realistic benchmarking for microcontroller vision applications.
It promotes development of models balancing accuracy and memory efficiency.
Abstract
The emergence of Internet of Things (IoT) applications requires intelligence on the edge. Microcontrollers provide a low-cost compute platform to deploy intelligent IoT applications using machine learning at scale, but have extremely limited on-chip memory and compute capability. To deploy computer vision on such devices, we need tiny vision models that fit within a few hundred kilobytes of memory footprint in terms of peak usage and model size on device storage. To facilitate the development of microcontroller friendly models, we present a new dataset, Visual Wake Words, that represents a common microcontroller vision use-case of identifying whether a person is present in the image or not, and provides a realistic benchmark for tiny vision models. Within a limited memory footprint of 250 KB, several state-of-the-art mobile models achieve accuracy of 85-90% on the Visual Wake Words…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsIoT and Edge/Fog Computing · Advanced Neural Network Applications · Visual Attention and Saliency Detection
