QUTE: Quantifying Uncertainty in TinyML with Early-exit-assisted ensembles for model-monitoring
Nikhil P Ghanathe, Steven J E Wilton

TL;DR
QUTE is a resource-efficient ensemble architecture for tinyML that improves uncertainty quantification, reduces model size and latency, and enhances accuracy-drop detection on microcontrollers.
Contribution
QUTE introduces a novel early-exit-assisted ensemble design with knowledge distillation, optimized for tinyML, achieving better uncertainty and efficiency than prior methods.
Findings
Achieves 59% smaller models than previous approaches.
Reduces latency by 31% on microcontrollers.
Outperforms prior methods in detecting accuracy drops.
Abstract
Uncertainty quantification (UQ) provides a resource-efficient solution for on-device monitoring of tinyML models deployed without access to true labels. However, existing UQ methods impose significant memory and compute demands, making them impractical for ultra-low-power, KB-sized TinyML devices. Prior work has attempted to reduce overhead by using early-exit ensembles to quantify uncertainty in a single forward pass, but these approaches still carry prohibitive costs. To address this, we propose QUTE, a novel resource-efficient early-exit-assisted ensemble architecture optimized for tinyML models. QUTE introduces additional output blocks at the final exit of the base network, distilling early-exit knowledge into these blocks to form a diverse yet lightweight ensemble. We show that QUTE delivers superior uncertainty quality on tiny models, achieving comparable performance on larger…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Data Classification · Machine Learning and Algorithms · Topic Modeling
MethodsBalanced Selection
