SEMF: Supervised Expectation-Maximization Framework for Predicting Intervals
Ilia Azizi, Marc-Olivier Boldi, Val\'erie Chavez-Demoulin

TL;DR
SEMF is a versatile framework that extends the EM algorithm to supervised learning, enabling accurate and narrower prediction intervals across various models and datasets, outperforming traditional methods.
Contribution
It introduces a model-agnostic supervised EM framework for prediction intervals, improving uncertainty estimation without relying on quantile loss.
Findings
Produces narrower prediction intervals with maintained coverage
Outperforms traditional quantile regression methods
Effective across diverse models and datasets
Abstract
This work introduces the Supervised Expectation-Maximization Framework (SEMF), a versatile and model-agnostic approach for generating prediction intervals with any ML model. SEMF extends the Expectation-Maximization algorithm, traditionally used in unsupervised learning, to a supervised context, leveraging latent variable modeling for uncertainty estimation. Through extensive empirical evaluation of diverse simulated distributions and 11 real-world tabular datasets, SEMF consistently produces narrower prediction intervals while maintaining the desired coverage probability, outperforming traditional quantile regression methods. Furthermore, without using the quantile (pinball) loss, SEMF allows point predictors, including gradient-boosted trees and neural networks, to be calibrated with conformal quantile regression. The results indicate that SEMF enhances uncertainty quantification…
Peer Reviews
Decision·ICLR 2025 Conference Withdrawn Submission
Importance: Developing methods and models for improving predictions with their uncertainty estimates is significant and important for applications of predicted models especially when incorrect predictions may carry a high risk. Model agnostic framework the work aims to develop would be a great advantage as it does not have to be tailored to specifics of individual models. Novelty: The proposed EM framework is novel in context of uncertainty assessment. The methodology of EM and its use in unsu
[W1] Intuition. Lack of intuition and argument supporting the benefit of the SEMF framework. The paper says it aims to leverage latent variable modeling framework for uncertainty estimation in predictions. The intuition and justification of this step and design is however not very well argued in the paper. Adding the text covering the intuition aspect would greatly enhance the readability and clarity of the paper and its steps. Along the same lines it would be great to see arguments or intuition
- Adapting the EM algorithm seems pretty novel and the algorithm is well-motivated and sound. - There have been many methods introduced in the uncertainty literature which aim to produce prediction intervals, and this work would fit alongside those methods. - Consideration for missing data is an interesting application setting which prior works in uncertainty to not consider often. - Overall, the writing quality is good and mostly easy to follow
- The proposed method incorporates conformal prediction at the end, which guarantees correct coverage. the only other metric is the PI width. In that case, is the benefit of the method just in producing more tightly clustered samples which lead to tighter PI? - Is it correct to understand the proposed method as just producing samples from the modeled underlying distribution? There are other metrics which are sample-based that could provide a more holistic evaluation of the quality of the predict
- The SEMF proposed by the authors is a model-agnostic method, meaning it can be integrated with various machine learning models, providing high applicability and flexibility. - The problem addressed by the authors is often overlooked in the real world, specifically the presence of missing data and the uncertainty estimation of provided predictions. - The authors conducted experiments on a large number of datasets to validate the effectiveness of their method.
- The writing in this paper is unclear. I suggest that the authors introduce the research problem setting either before Section 2 or at the beginning of Section 2, rather than listing formulas. - The theoretical analysis in the paper assumes independent distributions among the variables, but real-world situations are often more complex. I believe the authors' investigation of this issue is not sufficiently thorough. - There appears to be a substantial amount of prior research [1, 2, 3] on interv
- The approach model agnostic and so can in theory easily applied on any supervised learning baseline model. - The overall algorithm is well described and easy to follow, at least for someone who has worked on EM algorithms.
- Using MC sampling to approximate the posterior over the latent variables z is a potentially inaccurate approach, especially as z increases in dimensionality. This is the reason, in Bayesian inference, we would e.g. use MCMC sampling not MC sampling from the prior. Can the authors comment on why this is not a problem in their approach? Asked differently, how large did the number of samples R have to be in their cases to produce good results? How does R affect the quality of the results? - The r
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAnomaly Detection Techniques and Applications · Time Series Analysis and Forecasting · Fault Detection and Control Systems
