Probing the Effect of Selection Bias on Generalization: A Thought   Experiment

John K. Tsotsos; Jun Luo

arXiv:2105.09934·cs.CV·May 3, 2022

Probing the Effect of Selection Bias on Generalization: A Thought Experiment

John K. Tsotsos, Jun Luo

PDF

Open Access

TL;DR

This paper explores how selection bias in training data affects the generalization ability of visual recognition systems through a thought experiment, highlighting potential limitations and areas for improvement.

Contribution

It introduces a theoretical thought experiment to analyze the impact of selection bias on model generalization, offering a new perspective on data bias issues.

Findings

01

Selection bias can significantly limit generalization.

02

Thought experiment highlights potential data collection deficiencies.

03

Framework for analyzing bias effects in learned systems.

Abstract

Learned systems in the domain of visual recognition and cognition impress in part because even though they are trained with datasets many orders of magnitude smaller than the full population of possible images, they exhibit sufficient generalization to be applicable to new and previously unseen data. Since training data sets typically represent small sampling of a domain, the possibility of bias in their composition is very real. But what are the limits of generalization given such bias, and up to what point might it be sufficient for a real problem task? Although many have examined issues regarding generalization, this question may require examining the data itself. Here, we focus on the characteristics of the training data that may play a role. Other disciplines have grappled with these problems, most interestingly epidemiology, where experimental bias is a critical concern. The range…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Statistical Methods and Models · Computational Drug Discovery Methods · Bayesian Modeling and Causal Inference