No Free Delivery Service: Epistemic limits of passive data collection in   complex social systems

Maximilian Nickel

arXiv:2411.13653·cs.AI·November 22, 2024

No Free Delivery Service: Epistemic limits of passive data collection in complex social systems

Maximilian Nickel

PDF

Open Access 1 Video

TL;DR

This paper demonstrates that the standard train-test paradigm for model validation is fundamentally invalid in complex social systems, posing epistemic limitations for AI validation and deployment.

Contribution

It provides formal impossibility results showing the invalidity of the train-test paradigm in complex social systems, highlighting a critical epistemic challenge for AI validation.

Findings

01

Train-test paradigm is invalid for risk estimation in social systems.

02

Naive scaling and benchmarks do not address validation issues.

03

Formal results apply to recommender systems and large language models.

Abstract

Rapid model validation via the train-test paradigm has been a key driver for the breathtaking progress in machine learning and AI. However, modern AI systems often depend on a combination of tasks and data collection practices that violate all assumptions ensuring test validity. Yet, without rigorous model validation we cannot ensure the intended outcomes of deployed AI systems, including positive social impact, nor continue to advance AI research in a scientifically sound way. In this paper, I will show that for widely considered inference settings in complex social systems the train-test paradigm does not only lack a justification but is indeed invalid for any risk estimator, including counterfactual and causal estimators, with high probability. These formal impossibility results highlight a fundamental epistemic issue, i.e., that for key tasks in modern AI we cannot know whether…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No Free Delivery Service: Epistemic limits of passive data collection in complex social systems· slideslive

Taxonomy

TopicsInformation Systems Theories and Implementation · Mental Health and Patient Involvement · Digital Economy and Work Transformation

MethodsAttention Model