The Pitfalls of Sample Selection: A Case Study on Lung Nodule   Classification

Vasileios Baltatzis; Kyriaki-Margarita Bintsi; Loic Le Folgoc; Octavio; E. Martinez Manzanera; Sam Ellis; Arjun Nair; Sujal Desai; Ben Glocker; Julia; A. Schnabel

arXiv:2108.05386·cs.CV·August 13, 2021

The Pitfalls of Sample Selection: A Case Study on Lung Nodule Classification

Vasileios Baltatzis, Kyriaki-Margarita Bintsi, Loic Le Folgoc, Octavio, E. Martinez Manzanera, Sam Ellis, Arjun Nair, Sujal Desai, Ben Glocker, Julia, A. Schnabel

PDF

TL;DR

This paper highlights how different data selection processes in lung nodule classification studies lead to inconsistent results, emphasizing the importance of standardized data practices for fair comparison and valid conclusions.

Contribution

It demonstrates the impact of data selection and label aggregation choices on model performance and highlights the need for standardized methodologies in lung nodule classification research.

Findings

01

Different data selection processes cause significant performance variation.

02

Specific label aggregation choices can alter data distribution and results.

03

Advanced models may underperform simple baselines on challenging data subsets.

Abstract

Using publicly available data to determine the performance of methodological contributions is important as it facilitates reproducibility and allows scrutiny of the published results. In lung nodule classification, for example, many works report results on the publicly available LIDC dataset. In theory, this should allow a direct comparison of the performance of proposed methods and assess the impact of individual contributions. When analyzing seven recent works, however, we find that each employs a different data selection process, leading to largely varying total number of samples and ratios between benign and malignant cases. As each subset will have different characteristics with varying difficulty for classification, a direct comparison between the proposed methods is thus not always possible, nor fair. We study the particular effect of truthing when aggregating labels from…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsAttentive Walk-Aggregating Graph Neural Network