A Simple Way to Deal with Cherry-picking

Junpei Komiyama; Takanori Maehara

arXiv:1810.04996·stat.ME·October 12, 2018·1 cites

A Simple Way to Deal with Cherry-picking

Junpei Komiyama, Takanori Maehara

PDF

Open Access

TL;DR

This paper addresses the problem of cherry-picking in hypothesis testing, especially in machine learning, proposing a post-reporting verification method to reduce false discoveries caused by selection bias.

Contribution

It introduces a simple post-reporting verification approach to mitigate bias in reported results, supported by theoretical analysis and experimental validation.

Findings

01

Post-reporting verification reduces false discoveries.

02

The method is effective on synthetic and real datasets.

03

Selection bias can lead to false claims of innovation.

Abstract

Statistical hypothesis testing serves as statistical evidence for scientific innovation. However, if the reported results are intentionally biased, hypothesis testing no longer controls the rate of false discovery. In particular, we study such selection bias in machine learning models where the reporter is motivated to promote an algorithmic innovation. When the number of possible configurations (e.g., datasets) is large, we show that the reporter can falsely report an innovation even if there is no improvement at all. We propose a `post-reporting' solution to this issue where the bias of the reported results is verified by another set of results. The theoretical findings are supported by experimental results with synthetic and real-world datasets.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Benford’s Law and Fraud Detection · Advanced Causal Inference Techniques