Outliers and anomalies in training and testing datasets for AI-powered morphometry—evidence from CT scans of the spleen
Yuriy Vasilev, Anastasia Pamova, Tatiana Bobrovskaya, Anton Vladzimirskyy, Olga Omelyanskaya, Elena Astapenko, Artem Kruchinkin, Novik Vladimir, Kirill Arzamasov

TL;DR
This study explores methods to detect outliers and anomalies in medical datasets used for training AI to measure organ sizes, using spleen CT scans as an example.
Contribution
The study identifies effective methods for detecting anomalies in morphometric datasets, combining visual, statistical, and machine learning approaches.
Findings
Visual methods like boxplots and histograms were effective for identifying outliers.
Machine learning algorithms such as OSVM, KNN, and autoencoders also proved useful.
A total of 32 outlier anomalies were detected in the spleen dataset.
Abstract
Creating training and testing datasets for machine learning algorithms to measure linear dimensions of organs is a tedious task. There are no universally accepted methods for evaluating outliers or anomalies in such datasets. This can cause errors in machine learning and compromise the quality of end products. The goal of this study is to identify optimal methods for detecting organ anomalies and outliers in medical datasets designed to train and test neural networks in morphometrics. A dataset was created containing linear measurements of the spleen obtained from CT scans. Labelling was performed by three radiologists. The total number of studies included in the sample was N = 197 patients. Using visual methods (1.5 interquartile range; heat map; boxplot; histogram; scatter plot), machine learning algorithms (Isolation forest; Density-Based Spatial Clustering of Applications with…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6
Figure 7
Figure 8
Figure 9
Figure 10Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsArtificial Intelligence in Healthcare and Education · Radiomics and Machine Learning in Medical Imaging · COVID-19 diagnosis using AI
