Phenotyping with Positive Unlabelled Learning for Genome-Wide Association Studies
Andre Vauvelle, Hamish Tomlinson, Aaron Sim, Spiros Denaxas

TL;DR
This paper introduces AnchorBERT, a novel machine learning model combining anchor learning and transformers, to improve phenotyping accuracy in electronic health records, thereby enhancing genome-wide association studies with fewer cases and controls.
Contribution
The study presents a new semi-supervised phenotyping method that reduces noise-induced misclassification, enabling more effective genetic association detection in GWAS.
Findings
AnchorBERT detects associations comparable to large consortium studies with 5 times more cases.
The model maintains 40% more significant associations when controls are halved.
Improved phenotyping enhances the power of GWAS to discover genetic links.
Abstract
Identifying phenotypes plays an important role in furthering our understanding of disease biology through practical applications within healthcare and the life sciences. The challenge of dealing with the complexities and noise within electronic health records (EHRs) has motivated applications of machine learning in phenotypic discovery. While recent research has focused on finding predictive subtypes for clinical decision support, here we instead focus on the noise that results in phenotypic misclassification, which can reduce a phenotypes ability to detect associations in genome-wide association studies (GWAS). We show that by combining anchor learning and transformer architectures into our proposed model, AnchorBERT, we are able to detect genomic associations only previously found in large consortium studies with 5 more cases. When reducing the number of controls available by…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBiomedical Text Mining and Ontologies · Machine Learning in Healthcare · Genetic Associations and Epidemiology
