Reference String Extraction Using Line-Based Conditional Random Fields
Martin K\"orner

TL;DR
This paper introduces a novel line-based conditional random field model for extracting individual reference strings from scientific publications, simplifying the process by considering entire lines as potential reference parts.
Contribution
It presents a new classification approach that models reference string extraction at the line level using CRFs, reducing complexity compared to word-based models.
Findings
Effective line-based CRF model for reference extraction
Improved accuracy over traditional two-step methods
Simplified model complexity
Abstract
The extraction of individual reference strings from the reference section of scientific publications is an important step in the citation extraction pipeline. Current approaches divide this task into two steps by first detecting the reference section areas and then grouping the text lines in such areas into reference strings. We propose a classification model that considers every line in a publication as a potential part of a reference string. By applying line-based conditional random fields rather than constructing the graphical model based on the individual words, dependencies and patterns that are typical in reference sections provide strong features while the overall complexity of the model is reduced.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Advanced Text Analysis Techniques · Topic Modeling
