An integrated approach for rare disease detection and classification in Spanish pediatric medical reports
Andres Duque, Lourdes Araujo, Juan Martinez-Romo, María D. Esteban-Vasallo, María-Felicitas Domínguez-Berjón, David Malillos Perez

TL;DR
This paper introduces a system for detecting and classifying rare diseases in Spanish pediatric medical reports using both semi-supervised and supervised machine learning techniques.
Contribution
The novel contribution is a semi-supervised keyphrase-based system combined with expert validation for rare disease detection and classification in clinical texts.
Findings
A validated dataset of 1900 annotated texts containing rare disease mentions was created.
Supervised models outperformed semi-supervised ones by over 10% in F-Measure for rare disease classification.
Semi-supervised models showed promise in handling limited data scenarios.
Abstract
Rare disease detection and classification is one of the most significant challenges in the application of Natural Language Processing techniques to the analysis and extraction of information from biomedical texts. In this paper, we present a novel research focused on the detection and classification of rare diseases in clinical notes extracted from a cohort of pediatric patients from the Community of Madrid in Spain. From a set of collected and anonymized medical records, we propose a semi-supervised, keyphrase-based system to perform an initial detection of mentions of rare diseases, which is then validated and refined by experts to build a consolidated dataset concerning a subset of different rare diseases. Based on this dataset, we carry out a series of experiments for rare disease classification using both a semi-supervised technique and state-of-the-art supervised systems based on…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
Figure 1
Figure 2
Figure 3
Figure 4Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBiomedical Text Mining and Ontologies · Advanced Text Analysis Techniques · Health Literacy and Information Accessibility
