TL;DR
This paper demonstrates that improper dataset splitting in OCT image classification inflates model accuracy metrics, emphasizing the need for proper data handling to ensure valid evaluation of deep learning models.
Contribution
It highlights the significant impact of data leakage due to improper dataset splitting on the evaluation of OCT classification models, which has been overlooked in prior research.
Findings
Model performance inflated by up to 0.43 in MCC due to data leakage
Improper splitting causes 5-30% overestimation of accuracy
Proper dataset handling is crucial for valid model evaluation
Abstract
In the application of deep learning on optical coherence tomography (OCT) data, it is common to train classification networks using 2D images originating from volumetric data. Given the micrometer resolution of OCT systems, consecutive images are often very similar in both visible structures and noise. Thus, an inappropriate data split can result in overlap between the training and testing sets, with a large portion of the literature overlooking this aspect. In this study, the effect of improper dataset splitting on model evaluation is demonstrated for three classification tasks using three OCT open-access datasets extensively used, Kermany's and Srinivasan's ophthalmology datasets, and AIIMS breast tissue dataset. Results show that the classification performance is inflated by 0.07 up to 0.43 in terms of Matthews Correlation Coefficient (accuracy: 5% to 30%) for models tested on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
