Harvesting the Public MeSH Note field
Anastasios Nentidis, Anastasia Krithara, Grigorios Tsoumakas, Georgios, Paliouras

TL;DR
This paper presents a semi-automated method using regular expressions to analyze the Public MeSH Note field, extracting historical information about new descriptors' status with minimal manual effort.
Contribution
It introduces a semi-automated approach for analyzing semi-structured MeSH data, reducing manual effort in extracting historical descriptor information.
Findings
High success rate in extracting previous descriptor status
Open-source code available for reproducibility
Method applicable to large-scale MeSH data analysis
Abstract
In this document, we report an analysis of the Public MeSH Note field of the new descriptors introduced in the MeSH thesaurus between 2006 and 2020. The aim of this analysis was to extract information about the previous status of these new descriptors as Supplementary Concept Records. The Public MeSH Note field contains information in semi-structured text, meant to be read by humans. Therefore, we adopted a semi-automated approach, based on regular expressions, to extract information from it. In the large majority of cases, we managed to minimize the required manual effort for extracting the previous state of a new descriptor as a Supplementary Concept Record. The source code for this analysis is openly available on GitHub.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBiomedical Text Mining and Ontologies · Machine Learning in Bioinformatics · Topic Modeling
