Knockout: A simple way to handle missing inputs
Minh Nguyen, Batuhan K. Karaman, Heejong Kim, Alan Q. Wang, Fengbei Liu, Mert R. Sabuncu

TL;DR
Knockout is a simple, efficient training method that improves handling missing inputs in multimodal deep learning models by implicitly marginalizing over missing features, demonstrated to perform well across various datasets.
Contribution
The paper introduces Knockout, a novel method that trains models to handle missing inputs by randomly replacing features, providing a theoretically justified and empirically effective solution.
Findings
Knockout achieves competitive or superior performance compared to existing methods.
It is computationally efficient and scalable to high-dimensional data.
Theoretical analysis supports its interpretation as implicit marginalization.
Abstract
Deep learning models benefit from rich (e.g., multi-modal) input features. However, multimodal models might be challenging to deploy, because some inputs may be missing at inference. Current popular solutions include marginalization, imputation, and training multiple models. Marginalization achieves calibrated predictions, but it is computationally expensive and only feasible for low dimensional inputs. Imputation may result in inaccurate predictions, particularly when high-dimensional data, such as images, are missing. Training multiple models, where each model is designed to handle different subsets of inputs, can work well but requires prior knowledge of missing input patterns. Furthermore, training and retaining multiple models can be costly. We propose an efficient method to learn both the conditional distribution using full inputs and the marginal distributions. Our method,…
Peer Reviews
Decision·ICLR 2025 Conference Withdrawn Submission
- The theoretical analysis is well-reasoned, and the empirical evaluation is comprehensive, covering a diverse set of tasks and data modalities. The authors have taken care to compare against appropriate baselines and provide ablation studies (e.g., structured vs. unstructured Knockout). - The paper is well-written and clearly explains the core idea, theoretical justification, and experimental setup. The authors have provided sufficient details to facilitate reproducibility.
- The idea of randomly masking/corrupting inputs during training is not entirely new, many papers in related work section essentially use the same approach, eg PartialVAE, VAEAC, ACFlow, - While the authors provide theoretical justification for Knockout, the analysis relies on the assumption of using a very high capacity, non-linear model trained on large data. It is unclear how well Knockout would perform in scenarios with limited data or low-capacity models. - The comparison against strong ba
1. The manuscript is generally well-written, demonstrating quality and clarity. 2. The author presents a comprehensive review of related work. 3. The new Knockout method is evaluated against multiple strong baselines. 4. The author thoroughly discusses various types of missing data mechanisms and evaluates the performance of Knockout and common baselines on them.
1. Figure 2 shows that selecting an appropriate placeholder value has a strong impact on Knockout. While the author emphasizes the importance of this choice, a general guideline for choosing placeholder values is lacking, leaving it to be determined on a case-by-case basis. 2. The simulation results appear somewhat limited. The input dimension of X is only 9, and the number of missing features ranges from 0 to 3. It would be beneficial to include simulations that better align with real-world dat
- Elegant and practical solution that balances simplicity with theoretical soundness - Strong theoretical foundation with rigorous mathematical analysis - Impressive versatility across different data types and applications - Practical single-model solution compared to existing multi-model approaches - Comprehensive empirical evaluation with meaningful baselines - Clear and actionable implementation guidelines for practitioners
- Limited theoretical analysis for finite-capacity models and small datasets, as theory assumes high-capacity models and large data - Missing comparison against specialized models trained for specific missingness patterns - No detailed ablation study on optimal placeholder value selection, despite its importance - Lack of exploration into computational overhead during training compared to simpler approaches - Limited discussion of failure cases or scenarios where the method might underperform -
I appreciate the authors efforts to run experiments on various datasets. The paper is also interesting and practical, proposing a simple and straightforward technique.
Although this paper is interesting and practical. But, it is very incremental in terms of research novelty considering the expectations from an ICLR paper. For these types of papers, it is required to have thorough experimental studies and solid comparisons to show the applied contributions. But, I think this paper has lack of comparisons with important baselines or prior works. * The main missing baseline for comparison is the dropout method. Actually, the comparison between knockout and knock
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Domain Adaptation and Few-Shot Learning · Face recognition and analysis
