Sentence, Phrase, and Triple Annotations to Build a Knowledge Graph of Natural Language Processing Contributions -- A Trial Dataset
Jennifer D'Souza, S\"oren Auer

TL;DR
This paper presents a structured annotation scheme for extracting and normalizing NLP research contributions from scholarly articles, creating a dataset of sentences, phrases, and triples to facilitate building a knowledge graph.
Contribution
The work introduces NLPCONTRIBUTIONGRAPH, a novel annotation and normalization scheme for structuring NLP contributions directly from article sentences, with a focus on reducing noise and improving consistency.
Findings
Created a dataset with 900 sentences, 4,702 phrases, and 2,980 triples.
Achieved an intra-annotation F1 of 67.92% for sentences.
Demonstrated integration with the Open Research Knowledge Graph.
Abstract
Purpose: The aim of this work is to normalize the NLPCONTRIBUTIONS scheme (henceforward, NLPCONTRIBUTIONGRAPH) to structure, directly from article sentences, the contributions information in Natural Language Processing (NLP) scholarly articles via a two-stage annotation methodology: 1) pilot stage - to define the scheme (described in prior work); and 2) adjudication stage - to normalize the graphing model (the focus of this paper). Design/methodology/approach: We re-annotate, a second time, the contributions-pertinent information across 50 prior-annotated NLP scholarly articles in terms of a data pipeline comprising: contribution-centered sentences, phrases, and triple statements. To this end, specifically, care was taken in the adjudication annotation stage to reduce annotation noise while formulating the guidelines for our proposed novel NLP contributions structuring and graphing…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
