Creation of an Annotated Corpus of Spanish Radiology Reports

Viviana Cotik; Dar\'io Filippo; Roland Roller; Hans Uszkoreit; and Feiyu Xu

arXiv:1710.11154·cs.CL·November 1, 2017·1 cites

Creation of an Annotated Corpus of Spanish Radiology Reports

Viviana Cotik, Dar\'io Filippo, Roland Roller, Hans Uszkoreit, and Feiyu Xu

PDF

Open Access

TL;DR

This paper introduces a new annotated corpus of 513 Spanish radiology reports, providing valuable resources for developing and evaluating NLP algorithms in the biomedical domain, especially for Spanish language data.

Contribution

It creates and shares a manually annotated Spanish radiology report corpus with entities, negation, and relations, addressing the scarcity of biomedical annotated resources.

Findings

01

Corpus enables evaluation of NER and relation extraction algorithms

02

Provides guidelines for creating similar biomedical annotated datasets

03

Facilitates supervised learning approaches in Spanish biomedical NLP

Abstract

This paper presents a new annotated corpus of 513 anonymized radiology reports written in Spanish. Reports were manually annotated with entities, negation and uncertainty terms and relations. The corpus was conceived as an evaluation resource for named entity recognition and relation extraction algorithms, and as input for the use of supervised methods. Biomedical annotated resources are scarce due to confidentiality issues and associated costs. This work provides some guidelines that could help other researchers to undertake similar tasks.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsBiomedical Text Mining and Ontologies · Topic Modeling · Natural Language Processing Techniques