Reference String Extraction Using Line-Based Conditional Random Fields

Martin K\"orner

arXiv:1705.08154·cs.IR·May 24, 2017·2 cites

Reference String Extraction Using Line-Based Conditional Random Fields

Martin K\"orner

PDF

Open Access

TL;DR

This paper introduces a novel line-based conditional random field model for extracting individual reference strings from scientific publications, simplifying the process by considering entire lines as potential reference parts.

Contribution

It presents a new classification approach that models reference string extraction at the line level using CRFs, reducing complexity compared to word-based models.

Findings

01

Effective line-based CRF model for reference extraction

02

Improved accuracy over traditional two-step methods

03

Simplified model complexity

Abstract

The extraction of individual reference strings from the reference section of scientific publications is an important step in the citation extraction pipeline. Current approaches divide this task into two steps by first detecting the reference section areas and then grouping the text lines in such areas into reference strings. We propose a classification model that considers every line in a publication as a potential part of a reference string. By applying line-based conditional random fields rather than constructing the graphical model based on the individual words, dependencies and patterns that are typical in reference sections provide strong features while the overall complexity of the model is reduced.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Advanced Text Analysis Techniques · Topic Modeling