Structured references from PDF articles: assessing the tools for   bibliographic reference extraction and parsing

Alessia Cioffi; Silvio Peroni

arXiv:2205.14677·cs.DL·September 7, 2022

Structured references from PDF articles: assessing the tools for bibliographic reference extraction and parsing

Alessia Cioffi, Silvio Peroni

PDF

Open Access

TL;DR

This paper evaluates seven tools for extracting and parsing bibliographic references from PDF articles, comparing their performance across diverse subject areas to identify the most effective solutions.

Contribution

It provides a comprehensive comparison and assessment of existing reference extraction tools from PDFs, highlighting their strengths and weaknesses.

Findings

01

Anystyle achieved the highest overall performance.

02

Cermine was the second-best tool overall.

03

Different tools excelled in specific subject areas.

Abstract

Many solutions have been provided to extract bibliographic references from PDF papers. Machine learning, rule-based and regular expressions approaches were among the most used methods adopted in tools for addressing this task. This work aims to identify and evaluate all and only the tools which, given a full-text paper in PDF format, can recognise, extract and parse bibliographic references. We identified seven tools: Anystyle, Cermine, ExCite, Grobid, Pdfssa4met, Scholarcy and Science Parse. We compared and evaluated them against a corpus of 56 PDF articles published in 27 subject areas. Indeed, Anystyle obtained the best overall score, followed by Cermine. However, in some subject areas, other tools had better results for specific tasks.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsData Quality and Management · Advanced Text Analysis Techniques · Topic Modeling