TL;DR
This paper introduces a deep learning sequence-labeling framework for assigning species information to genes in research articles, significantly improving accuracy over traditional heuristic methods.
Contribution
The paper presents a novel deep learning-based sequence-labeling approach for gene-species assignment, outperforming rule-based methods in accuracy.
Findings
Accuracy improved from 65.8% to 81.3%.
Sequence-labeling reduces the number of pairs evaluated.
Open-source code and data available.
Abstract
The automatic assignment of species information to the corresponding genes in a research article is a critically important step in the gene normalization task, whereby a gene mention is normalized and linked to a database record or identifier by a text-mining algorithm. Existing methods typically rely on heuristic rules based on gene and species co-occurrence in the article, but their accuracy is suboptimal. We therefore developed a high-performance method, using a novel deep learning-based framework, to classify whether there is a relation between a gene and a species. Instead of the traditional binary classification framework in which all possible pairs of genes and species in the same article are evaluated, we treat the problem as a sequence-labeling task such that only a fraction of the pairs needs to be considered. Our benchmarking results show that our approach obtains…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
