LitGen: Genetic Literature Recommendation Guided by Human Explanations
Allen Nie, Arturo L. Pineda, Matt W. Wright Hannah Wand, Bryan Wulf,, Helio A. Costa, Ronak Y. Patel, Carlos D. Bustamante, James Zou

TL;DR
LitGen is a machine learning system designed to assist clinical genetic variant curation by retrieving and filtering scientific papers based on evidence types, leveraging human explanations and semi-supervised learning for improved accuracy.
Contribution
This work introduces the first ML system for targeted literature retrieval in clinical genetics, utilizing human explanations and semi-supervised learning to enhance evidence classification.
Findings
Achieved 7.9%-12.6% relative performance improvement over baseline models.
Successfully trained on ClinGen-annotated papers and evaluated on new test data.
Demonstrated utility in improving clinical variant curation workflows.
Abstract
As genetic sequencing costs decrease, the lack of clinical interpretation of variants has become the bottleneck in using genetics data. A major rate limiting step in clinical interpretation is the manual curation of evidence in the genetic literature by highly trained biocurators. What makes curation particularly time-consuming is that the curator needs to identify papers that study variant pathogenicity using different types of approaches and evidences---e.g. biochemical assays or case control analysis. In collaboration with the Clinical Genomic Resource (ClinGen)---the flagship NIH program for clinical curation---we propose the first machine learning system, LitGen, that can retrieve papers for a particular variant and filter them by specific evidence types used by curators to assess for pathogenicity. LitGen uses semi-supervised deep learning to predict the type of evidence provided…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBiomedical Text Mining and Ontologies · Genomics and Rare Diseases · Topic Modeling
