Overfitting the literature to one set of stimuli and data

Tijl Grootswagers; Amanda K Robinson

arXiv:2102.09729·q-bio.NC·July 9, 2021

Overfitting the literature to one set of stimuli and data

Tijl Grootswagers, Amanda K Robinson

PDF

TL;DR

The paper highlights the risk of overfitting in Computational Cognitive Neuroscience due to repeated use of limited stimuli and datasets, emphasizing the need for diverse data collection to ensure generalizable and robust findings.

Contribution

It identifies the problem of overfitting caused by data reuse and advocates for broader data collection to improve the reliability of research in the field.

Findings

01

Overfitting is a significant risk in current research practices.

02

Limited stimuli and datasets hinder the generalizability of findings.

03

Diverse, high-quality open datasets are urgently needed.

Abstract

The fast-growing field of Computational Cognitive Neuroscience is on track to meet its first crisis. A large number of papers in this nascent field are developing and testing novel analysis methods using the same stimuli and neuroimaging datasets. Publication bias and confirmatory exploration will result in overfitting to the limited available data. The field urgently needs to collect more good quality open neuroimaging data using a variety of experimental stimuli, to test the generalisability of current published results, and allow for more robust results in future work.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.