tmVar 3.0: an improved variant concept recognition and normalization tool
Chih-Hsuan Wei, Alexis Allot, Kevin Riehle, Aleksandar Milosavljevic,, Zhiyong Lu

TL;DR
tmVar 3.0 is an advanced text-mining tool that significantly improves recognition and normalization of genetic variants in scientific literature, achieving over 90% accuracy and supporting broad variant types.
Contribution
It introduces a comprehensive variant recognition system with enhanced scope, grouping capabilities, and normalization options, surpassing previous tools in accuracy and coverage.
Findings
Achieves over 90% F-measure in variant recognition and normalization.
Recognizes a wide spectrum of variant entities including alleles and copy number variants.
Processed entire PubMed and PMC with annotations available for public use.
Abstract
Previous studies have shown that automated text-mining tools are becoming increasingly important for successfully unlocking variant information in scientific literature at large scale. Despite multiple attempts in the past, existing tools are still of limited recognition scope and precision. We propose tmVar 3.0: an improved variant recognition and normalization tool. Compared to its predecessors, tmVar 3.0 is able to recognize a wide spectrum of variant related entities (e.g., allele and copy number variants), and to group different variant mentions belonging to the same concept in an article for improved accuracy. Moreover, tmVar3 provides additional variant normalization options such as allele-specific identifiers from the ClinGen Allele Registry. tmVar3 exhibits a state-of-the-art performance with over 90% accuracy in F-measure in variant recognition and normalization, when…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBiomedical Text Mining and Ontologies · Topic Modeling · Gene expression and cancer classification
