Sentence, Phrase, and Triple Annotations to Build a Knowledge Graph of   Natural Language Processing Contributions -- A Trial Dataset

Jennifer D'Souza; S\"oren Auer

arXiv:2010.04388·cs.CL·May 10, 2021

Sentence, Phrase, and Triple Annotations to Build a Knowledge Graph of Natural Language Processing Contributions -- A Trial Dataset

Jennifer D'Souza, S\"oren Auer

PDF

TL;DR

This paper presents a structured annotation scheme for extracting and normalizing NLP research contributions from scholarly articles, creating a dataset of sentences, phrases, and triples to facilitate building a knowledge graph.

Contribution

The work introduces NLPCONTRIBUTIONGRAPH, a novel annotation and normalization scheme for structuring NLP contributions directly from article sentences, with a focus on reducing noise and improving consistency.

Findings

01

Created a dataset with 900 sentences, 4,702 phrases, and 2,980 triples.

02

Achieved an intra-annotation F1 of 67.92% for sentences.

03

Demonstrated integration with the Open Research Knowledge Graph.

Abstract

Purpose: The aim of this work is to normalize the NLPCONTRIBUTIONS scheme (henceforward, NLPCONTRIBUTIONGRAPH) to structure, directly from article sentences, the contributions information in Natural Language Processing (NLP) scholarly articles via a two-stage annotation methodology: 1) pilot stage - to define the scheme (described in prior work); and 2) adjudication stage - to normalize the graphing model (the focus of this paper). Design/methodology/approach: We re-annotate, a second time, the contributions-pertinent information across 50 prior-annotated NLP scholarly articles in terms of a data pipeline comprising: contribution-centered sentences, phrases, and triple statements. To this end, specifically, care was taken in the adjudication annotation stage to reduce annotation noise while formulating the guidelines for our proposed novel NLP contributions structuring and graphing…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.