Inflation of test accuracy due to data leakage in deep learning-based   classification of OCT images

Iulian Emil Tampu; Anders Eklund; Neda Haj-Hosseini

arXiv:2202.12267·eess.IV·September 29, 2022

Inflation of test accuracy due to data leakage in deep learning-based classification of OCT images

Iulian Emil Tampu, Anders Eklund, Neda Haj-Hosseini

PDF

2 Repos

TL;DR

This paper demonstrates that improper dataset splitting in OCT image classification inflates model accuracy metrics, emphasizing the need for proper data handling to ensure valid evaluation of deep learning models.

Contribution

It highlights the significant impact of data leakage due to improper dataset splitting on the evaluation of OCT classification models, which has been overlooked in prior research.

Findings

01

Model performance inflated by up to 0.43 in MCC due to data leakage

02

Improper splitting causes 5-30% overestimation of accuracy

03

Proper dataset handling is crucial for valid model evaluation

Abstract

In the application of deep learning on optical coherence tomography (OCT) data, it is common to train classification networks using 2D images originating from volumetric data. Given the micrometer resolution of OCT systems, consecutive images are often very similar in both visible structures and noise. Thus, an inappropriate data split can result in overlap between the training and testing sets, with a large portion of the literature overlooking this aspect. In this study, the effect of improper dataset splitting on model evaluation is demonstrated for three classification tasks using three OCT open-access datasets extensively used, Kermany's and Srinivasan's ophthalmology datasets, and AIIMS breast tissue dataset. Results show that the classification performance is inflated by 0.07 up to 0.43 in terms of Matthews Correlation Coefficient (accuracy: 5% to 30%) for models tested on…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.