Decoding MIE: A Novel Dataset Approach Using Topic Extraction and Affiliation Parsing
Ehsan Bitaraf, Maryam Jafarpour

TL;DR
This paper presents a new dataset derived from MIE conference proceedings, using topic extraction and affiliation parsing to enable comprehensive bibliometric and trend analyses in medical informatics.
Contribution
It introduces a novel, richly annotated dataset from nearly three decades of MIE publications, employing advanced text processing techniques for detailed bibliometric insights.
Findings
Identified patterns in DOI usage and citation trends
Revealed inconsistencies in author data
Observed a brief period of linguistic diversity
Abstract
The rapid expansion of medical informatics literature presents significant challenges in synthesizing and analyzing research trends. This study introduces a novel dataset derived from the Medical Informatics Europe (MIE) Conference proceedings, addressing the need for sophisticated analytical tools in the field. Utilizing the Triple-A software, we extracted and processed metadata and abstract from 4,606 articles published in the "Studies in Health Technology and Informatics" journal series, focusing on MIE conferences from 1996 onwards. Our methodology incorporated advanced techniques such as affiliation parsing using the TextRank algorithm. The resulting dataset, available in JSON format, offers a comprehensive view of bibliometric details, extracted topics, and standardized affiliation information. Analysis of this data revealed interesting patterns in Digital Object Identifier usage,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsText and Document Classification Technologies
