# A Silver Standard Corpus of Human Phenotype-Gene Relations

**Authors:** Diana Sousa, Andre Lamurias, Francisco M. Couto

arXiv: 1903.10728 · 2020-04-14

## TL;DR

This paper introduces the PGR corpus, a large annotated dataset of human phenotype-gene relations derived from biomedical literature, enabling improved relation extraction tools with promising initial results.

## Contribution

It provides the first publicly available silver standard corpus of human phenotype-gene relations, created using NER tools and partially validated by curators.

## Key findings

- Achieved 87.01% precision in corpus annotation
- Deep learning models reached 78.05% precision in relation extraction
- Corpus facilitates future research in phenotype-gene relation recognition

## Abstract

Human phenotype-gene relations are fundamental to fully understand the origin of some phenotypic abnormalities and their associated diseases. Biomedical literature is the most comprehensive source of these relations, however, we need Relation Extraction tools to automatically recognize them. Most of these tools require an annotated corpus and to the best of our knowledge, there is no corpus available annotated with human phenotype-gene relations. This paper presents the Phenotype-Gene Relations (PGR) corpus, a silver standard corpus of human phenotype and gene annotations and their relations. The corpus consists of 1712 abstracts, 5676 human phenotype annotations, 13835 gene annotations, and 4283 relations. We generated this corpus using Named-Entity Recognition tools, whose results were partially evaluated by eight curators, obtaining a precision of 87.01%. By using the corpus we were able to obtain promising results with two state-of-the-art deep learning tools, namely 78.05% of precision. The PGR corpus was made publicly available to the research community.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1903.10728/full.md

## References

16 references — full list in the complete paper: https://tomesphere.com/paper/1903.10728/full.md

---
Source: https://tomesphere.com/paper/1903.10728