Minimax rate of consistency for linear models with missing values
Alexis Ayme (LPSM (UMR\_8001)), Claire Boyer (LPSM (UMR\_8001),, MOKAPLAN), Aymeric Dieuleveut (CMAP), Erwan Scornet (CMAP)

TL;DR
This paper investigates the fundamental limits of learning linear models with missing data, proposing a new minimax optimal algorithm that effectively handles the exponential complexity caused by missing patterns.
Contribution
The paper introduces a rigorous analysis of least-square estimators with missing data and develops a novel adaptive algorithm with minimax optimal risk bounds.
Findings
Excess risk bound grows exponentially with dimension for naive estimators.
Proposed algorithm achieves minimax optimal risk bounds.
Numerical experiments show improved performance over existing methods.
Abstract
Missing values arise in most real-world data sets due to the aggregation of multiple sources and intrinsically missing information (sensor failure, unanswered questions in surveys...). In fact, the very nature of missing values usually prevents us from running standard learning algorithms. In this paper, we focus on the extensively-studied linear models, but in presence of missing values, which turns out to be quite a challenging task. Indeed, the Bayes rule can be decomposed as a sum of predictors corresponding to each missing pattern. This eventually requires to solve a number of learning tasks, exponential in the number of input features, which makes predictions impossible for current real-world datasets. First, we propose a rigorous setting to analyze a least-square type estimator and establish a bound on the excess risk which increases exponentially in the dimension. Consequently,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBayesian Methods and Mixture Models · Statistical Methods and Bayesian Inference · Statistical Methods and Inference
