Creation of an Annotated Corpus of Spanish Radiology Reports
Viviana Cotik, Dar\'io Filippo, Roland Roller, Hans Uszkoreit, and Feiyu Xu

TL;DR
This paper introduces a new annotated corpus of 513 Spanish radiology reports, providing valuable resources for developing and evaluating NLP algorithms in the biomedical domain, especially for Spanish language data.
Contribution
It creates and shares a manually annotated Spanish radiology report corpus with entities, negation, and relations, addressing the scarcity of biomedical annotated resources.
Findings
Corpus enables evaluation of NER and relation extraction algorithms
Provides guidelines for creating similar biomedical annotated datasets
Facilitates supervised learning approaches in Spanish biomedical NLP
Abstract
This paper presents a new annotated corpus of 513 anonymized radiology reports written in Spanish. Reports were manually annotated with entities, negation and uncertainty terms and relations. The corpus was conceived as an evaluation resource for named entity recognition and relation extraction algorithms, and as input for the use of supervised methods. Biomedical annotated resources are scarce due to confidentiality issues and associated costs. This work provides some guidelines that could help other researchers to undertake similar tasks.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBiomedical Text Mining and Ontologies · Topic Modeling · Natural Language Processing Techniques
