From Garbage to Gold: A Data-Architectural Theory of Predictive Robustness

Terrence J. Lee-St. John; Jordan L. Lawson; Bartlomiej Piechowski-Jozwiak

arXiv:2603.12288·cs.LG·March 16, 2026

From Garbage to Gold: A Data-Architectural Theory of Predictive Robustness

Terrence J. Lee-St. John, Jordan L. Lawson, Bartlomiej Piechowski-Jozwiak

PDF

Open Access

TL;DR

This paper presents a data-architectural theory explaining how high-dimensional, error-prone data can still yield robust predictive models, emphasizing the synergy between data structure and model capacity over data cleanliness.

Contribution

It introduces a novel theoretical framework combining Information Theory, Latent Factor Models, and Psychometrics to explain predictive robustness in high-D, error-prone data environments.

Findings

01

High-D predictor sets asymptotically overcome noise types

02

Informative collinearity enhances model reliability

03

Dimensionality reduces latent inference burden

Abstract

Tabular machine learning presents a paradox: modern models achieve state-of-the-art performance using high-dimensional (high-D), collinear, error-prone data, defying the "Garbage In, Garbage Out" mantra. To help resolve this, we synthesize principles from Information Theory, Latent Factor Models, and Psychometrics, clarifying that predictive robustness arises not solely from data cleanliness, but from the synergy between data architecture and model capacity. Partitioning predictor-space "noise" into "Predictor Error" and "Structural Uncertainty" (informational deficits from stochastic generative mappings), we prove that leveraging high-D sets of error-prone predictors asymptotically overcomes both types of noise, whereas cleaning a low-D set is fundamentally bounded by Structural Uncertainty. We demonstrate why "Informative Collinearity" (dependencies from shared latent causes) enhances…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsExplainable Artificial Intelligence (XAI) · Ethics and Social Impacts of AI · Generative Adversarial Networks and Image Synthesis