Systematic Misestimation of Machine Learning Performance in Neuroimaging Studies of Depression
Claas Flint, Micah Cearns, Nils Opel, Ronny Redlich, David M. A., Mehler, Daniel Emden, Nils R. Winter, Ramona Leenings, Simon B. Eickhoff,, Tilo Kircher, Axel Krug, Igor Nenadic, Volker Arolt, Scott Clark, Bernhard T., Baune, Xiaoyi Jiang, Udo Dannlowski, Tim Hahn

TL;DR
This study reveals that small sample sizes in neuroimaging machine learning studies of depression often lead to overestimated performance metrics, highlighting the importance of larger test sets for valid results.
Contribution
It systematically demonstrates the risk of performance misestimation in small samples and emphasizes the need for larger test sets to improve reliability in neuroimaging ML studies.
Findings
Small samples can produce inflated accuracy estimates.
Large test sets mitigate performance overestimation.
Current literature may overstate ML performance due to small sample sizes.
Abstract
We currently observe a disconcerting phenomenon in machine learning studies in psychiatry: While we would expect larger samples to yield better results due to the availability of more data, larger machine learning studies consistently show much weaker performance than the numerous small-scale studies. Here, we systematically investigated this effect focusing on one of the most heavily studied questions in the field, namely the classification of patients suffering from major depressive disorder (MDD) and healthy control (HC) based on neuroimaging data. Drawing upon structural magnetic resonance imaging (MRI) data from a balanced sample of MDD patients and HC from our recent international Predictive Analytics Competition (PAC), we first trained and tested a classification model on the full dataset which yielded an accuracy of . Next, we mimicked the process by which…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsTest
