Probing the Effect of Selection Bias on Generalization: A Thought Experiment
John K. Tsotsos, Jun Luo

TL;DR
This paper explores how selection bias in training data affects the generalization ability of visual recognition systems through a thought experiment, highlighting potential limitations and areas for improvement.
Contribution
It introduces a theoretical thought experiment to analyze the impact of selection bias on model generalization, offering a new perspective on data bias issues.
Findings
Selection bias can significantly limit generalization.
Thought experiment highlights potential data collection deficiencies.
Framework for analyzing bias effects in learned systems.
Abstract
Learned systems in the domain of visual recognition and cognition impress in part because even though they are trained with datasets many orders of magnitude smaller than the full population of possible images, they exhibit sufficient generalization to be applicable to new and previously unseen data. Since training data sets typically represent small sampling of a domain, the possibility of bias in their composition is very real. But what are the limits of generalization given such bias, and up to what point might it be sufficient for a real problem task? Although many have examined issues regarding generalization, this question may require examining the data itself. Here, we focus on the characteristics of the training data that may play a role. Other disciplines have grappled with these problems, most interestingly epidemiology, where experimental bias is a critical concern. The range…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Statistical Methods and Models · Computational Drug Discovery Methods · Bayesian Modeling and Causal Inference
