TCUQ: Single-Pass Uncertainty Quantification from Temporal Consistency with Streaming Conformal Calibration for TinyML
Ismail Lamaakal, Chaymae Yahyati, Khalid El Makkaoui, Ibrahim Ouahbi, Yassine Maleh

TL;DR
TCUQ is a lightweight, single-pass uncertainty quantification method for TinyML that uses temporal consistency and streaming conformal calibration to provide calibrated risk scores on microcontrollers, improving reliability and efficiency.
Contribution
It introduces TCUQ, a novel, resource-efficient uncertainty monitor that operates in a single pass without labels, suitable for TinyML devices, and enhances failure detection and calibration.
Findings
Reduces footprint and latency by 50-60% compared to existing methods.
Improves failure detection accuracy with up to 0.92 AUROC.
Enhances in-distribution and corrupted data monitoring with up to 0.86 AUPRC.
Abstract
We introduce TCUQ, a single pass, label free uncertainty monitor for streaming TinyML that converts short horizon temporal consistency captured via lightweight signals on posteriors and features into a calibrated risk score with an O(W ) ring buffer and O(1) per step updates. A streaming conformal layer turns this score into a budgeted accept/abstain rule, yielding calibrated behavior without online labels or extra forward passes. On microcontrollers, TCUQ fits comfortably on kilobyte scale devices and reduces footprint and latency versus early exit and deep ensembles (typically about 50 to 60% smaller and about 30 to 45% faster), while methods of similar accuracy often run out of memory. Under corrupted in distribution streams, TCUQ improves accuracy drop detection by 3 to 7 AUPRC points and reaches up to 0.86 AUPRC at high severities; for failure detection it attains up to 0.92 AUROC.…
Peer Reviews
Decision·ICLR 2026 Conference Withdrawn Submission
The paper addresses an important problem in the TinyML domain: reliable uncertainty quantification under strict hardware constraints. The proposed method is designed with these constraints in mind, employing a single forward pass, a small ring buffer, and constant-time updates. The use of temporal information is a practical approach to generating an uncertainty signal without the overhead associated with ensemble methods. The evaluation is a significant component of the work; it is conducted on
The performance of the proposed method depends on a set of hyperparameters, including the temporal window size (W), the lag set (L), and blending parameters. While these are tuned on a development set, the paper does not fully explore the sensitivity of the system to these choices, particularly in dynamic environments where stream characteristics may change. The method's foundation is the assumption of temporal consistency in the input stream, which may not hold for all potential TinyML applicat
- Combining short horizon consistency with a streaming conformal threshold in a design explicitly created for a single pass MCU seems to me clean and practical. Also, the paper explains well why competitive approaches struggle. - The engineering is well thought out, the method is O(1) per step with O(W(L +d’)) bytes per state, the posteriors are kept in 8-bit, etc, which makes the whole idea actually deployable on kilobyte scale devices. - The results, reduce latency by 31-43%, shrinked fla
- The approach assumed that successive inputs are related but maybe that is not the case or some extreme event happens, like a step function, what happens then? Maybe you could add an example where you inject a random intermediate input to check for degradation and maybe scenarios that TCUQ is not appropriate. - The logistic combiner is trained of line on a labelled set. This introduces a mismatch risk if deployment shifts differ from training shifts. Maybe you include an example that you excl
- This paper is generally well-written and is clear to understand the important aspects. - I like the motivation of TCUQ, as literature in uncertainty and robustness suffers a trade-off between performance and computational efficiency (e.g., Ensembles, Bayesian-Net, etc.). TCUQ aims to minimize this trade-off, and may be helpful for safety & real-time applications with low-resource devices.
- From a novelty perspective, I feel the proposed method is not really novel as TCUQ simply merges four existing temporal signals, then applies a conformal layer on top of the framework. - There is no theoretical guarantee to formally explain why TCUQ can improve safety performance. This limits the paper's contribution and limits the understanding of TCUQ from a theoretical perspective. - TCUQ does not consistently outperform other baselines in safety performance (e.g., Ensembles in Tab.2 with
The strengths of this paper are: - The problem statement is relevant and important. Deployment scenarios like phones and drones are always in fashion and makes this paper relevant. - Experiments on both large and small microcontrollers are a good touch.
The weaknesses of this paper are: - The conformal quantile is spoken about at a high-level but how it is actually calculated and why they use this over other methods is never mentioned. In all, the methodology as a whole is a little incomplete and reads like high-level ideas instead of an actual methodological framework includes full details and technical justification. - The empirical evaluate seems incomplete at times also. The authors compare against standard UQ methods in the TinyML setting
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMedical Imaging Techniques and Applications · Parallel Computing and Optimization Techniques · Embedded Systems Design Techniques
