Na\"iveRole: Author-Contribution Extraction and Parsing from Biomedical Manuscripts
Dominika Tkaczyk, Andrew Collins, Joeran Beel

TL;DR
This paper introduces Na"iveRole, a novel machine learning approach for extracting structured author contribution roles from biomedical manuscripts, supported by a statistical analysis of contribution sections and achieving moderate extraction accuracy.
Contribution
It presents the first automatic method for extracting author roles from research papers, combining statistical analysis and Na"ive Bayes classification.
Findings
Na"iveRole achieves a precision of 0.68 and recall of 0.48.
Discovered common author roles using co-clustering and Open Information Extraction.
Built a training set automatically from PubMed Central contributions.
Abstract
Information about the contributions of individual authors to scientific publications is important for assessing authors' achievements. Some biomedical publications have a short section that describes authors' roles and contributions. It is usually written in natural language and hence author contributions cannot be trivially extracted in a machine-readable format. In this paper, we present 1) A statistical analysis of roles in author contributions sections, and 2) Na\"iveRole, a novel approach to extract structured authors' roles from author contribution sections. For the first part, we used co-clustering techniques, as well as Open Information Extraction, to semi-automatically discover the popular roles within a corpus of 2,000 contributions sections from PubMed Central. The discovered roles were used to automatically build a training set for Na\"iveRole, our role extractor approach,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBiomedical Text Mining and Ontologies · Topic Modeling · Natural Language Processing Techniques
