Gradient Guided Hypotheses: A unified solution to enable machine learning models on scarce and noisy data regimes
Paulo Neves, Joerg K. Wegner, Philippe Schwaller

TL;DR
Gradient Guided Hypotheses (GGH) is a versatile, architecture-agnostic algorithm that improves machine learning performance on scarce and noisy data by analyzing gradients to identify and mitigate data issues.
Contribution
The paper introduces GGH, a novel gradient-based framework that addresses missing and noisy data simultaneously, outperforming existing methods especially in extreme data scarcity scenarios.
Findings
GGH significantly improves model performance on datasets with up to 98.5% missing data.
GGH outperforms state-of-the-art imputation methods in high-scarcity regimes.
GGH effectively detects and filters noisy data, enhancing model robustness.
Abstract
Ensuring high-quality data is paramount for maximizing the performance of machine learning models and business intelligence systems. However, challenges in data quality, including noise in data capture, missing records, limited data production, and confounding variables, significantly constrain the potential performance of these systems. In this study, we propose an architecture-agnostic algorithm, Gradient Guided Hypotheses (GGH), designed to address these challenges. GGH analyses gradients from hypotheses as a proxy of distinct and possibly contradictory patterns in the data. This framework entails an additional step in machine learning training, where gradients can be included or excluded from backpropagation. In this manner, missing and noisy data are addressed through a unified solution that perceives both challenges as facets of the same overarching issue: the propagation of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComputational Physics and Python Applications · Neural Networks and Applications · Machine Learning in Healthcare
