Detecting outliers in case-control cohorts for improving deep learning networks on Schizophrenia prediction
Daniel Martins, Maryam Abbasi, Conceição Egas, Joel P. Arrais

TL;DR
This study uses deep learning to improve schizophrenia prediction by identifying and filtering out genetic outliers in case-control datasets.
Contribution
A novel two-stage deep learning approach is introduced to detect and filter outliers in schizophrenia datasets, enhancing model performance.
Findings
Outlying genetic profiles in case-control datasets can hinder classification model performance.
Filtering outliers improves deep learning results and aligns them with heritability estimates for schizophrenia.
The approach enhances understanding of schizophrenia's genetic background and supports precision medicine in mental health.
Abstract
This study delves into the intricate genetic and clinical aspects of Schizophrenia, a complex mental disorder with uncertain etiology. Deep Learning (DL) holds promise for analyzing large genomic datasets to uncover new risk factors. However, based on reports of non-negligible misdiagnosis rates for SCZ, case-control cohorts may contain outlying genetic profiles, hindering compelling performances of classification models. The research employed a case-control dataset sourced from the Swedish populace. A gene-annotation-based DL architecture was developed and employed in two stages. First, the model was trained on the entire dataset to highlight differences between cases and controls. Then, samples likely to be misclassified were excluded, and the model was retrained on the refined dataset for performance evaluation. The results indicate that SCZ prevalence and misdiagnosis rates can…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6
Figure 7Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenetic Associations and Epidemiology · Machine Learning in Healthcare · Bioinformatics and Genomic Networks
