Pattern-based Acquisition of Scientific Entities from Scholarly Article   Titles

Jennifer D'Souza; Soeren Auer

arXiv:2109.00199·cs.IR·September 20, 2021

Pattern-based Acquisition of Scientific Entities from Scholarly Article Titles

Jennifer D'Souza, Soeren Auer

PDF

Open Access 1 Repo

TL;DR

This paper presents a rule-based method for automatically extracting key scientific entities from scholarly article titles in computational linguistics, leveraging pattern regularities to identify salient terms with high precision.

Contribution

The work introduces a novel pattern-based approach for extracting scientific entities from titles, covering multiple entity types with high accuracy, based on a large dataset of CL article titles.

Findings

01

Achieved an average precision of 75% in entity extraction.

02

Successfully extracted over 87,000 scientific entities across various categories.

03

Demonstrated the effectiveness of pattern regularities in identifying salient scientific terms.

Abstract

We describe a rule-based approach for the automatic acquisition of salient scientific entities from Computational Linguistics (CL) scholarly article titles. Two observations motivated the approach: (i) noting salient aspects of an article's contribution in its title; and (ii) pattern regularities capturing the salient terms that could be expressed in a set of rules. Only those lexico-syntactic patterns were selected that were easily recognizable, occurred frequently, and positionally indicated a scientific entity type. The rules were developed on a collection of 50,237 CL titles covering all articles in the ACL Anthology. In total, 19,799 research problems, 18,111 solutions, 20,033 resources, 1,059 languages, 6,878 tools, and 21,687 methods were extracted at an average precision of 75%.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

jd-coderepos/cl-titles-parser
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Biomedical Text Mining and Ontologies