From Prediction to Practice: A Task-Aware Evaluation Framework for Blood Glucose Forecasting
Alireza Namazi, Heman Shakeri

TL;DR
This paper proposes a task-aware evaluation framework for blood glucose forecasting that emphasizes clinical relevance over aggregate accuracy, incorporating real-world and simulated assessments for decision support.
Contribution
It introduces a novel evaluation framework that assesses models on operational alarm metrics and intervention response prediction, highlighting gaps between accuracy and clinical usefulness.
Findings
Models with high overall recall can fail in critical post-bolus scenarios.
Forecasting accuracy does not necessarily translate to effective insulin dosing predictions.
The framework reveals significant gaps between model accuracy and clinical decision support utility.
Abstract
Clinical time-series forecasting is increasingly studied for decision support, yet standard aggregate metrics can obscure whether a model is actually useful for the task it is meant to serve. In safety-critical settings, low average error can coexist with dangerous failures in exactly the high-risk regimes that matter most. We present a task-aware evaluation framework for blood glucose forecasting built around two downstream uses: hypoglycemia early warning and insulin dosing decision support. For early warning, we evaluate on real data from three clinical cohorts using event-level recall and false alarms per patient-day, metrics that reflect operational alarm burden rather than aggregate accuracy. We show that models appearing acceptable overall, with recall above 0.9 on the full test set, can fail badly in the post-bolus slice, where insulin-on-board is elevated and missed warnings…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
