A Simple Extraction Procedure for Bibliographical Author Field
Pere Constans

TL;DR
This paper introduces a straightforward method for extracting author information from scholarly texts by using capitalization, line breaks, and layout templates, along with disambiguation rules.
Contribution
It presents a novel, simple extraction procedure that effectively handles varied author layouts in bibliographic texts.
Findings
Successfully identifies author segments using capitalization and line breaks
Provides two main layout templates for diverse title pages
Includes disambiguation rules to improve accuracy
Abstract
A procedure for bibliographic author metadata extraction from scholarly texts is presented. The author segments are identified based on capitalization and line break patterns. Two main author layout templates, which can retrieve from a varied set of title pages, are provided. Additionally, several disambiguating rules are described.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAuthorship Attribution and Profiling · Web Data Mining and Analysis · Natural Language Processing Techniques
