An Integrated, Conditional Model of Information Extraction and   Coreference with Applications to Citation Matching

Ben Wellner; Andrew McCallum; Fuchun Peng; Michael Hay

arXiv:1207.4157·cs.LG·July 19, 2012·5 cites

An Integrated, Conditional Model of Information Extraction and Coreference with Applications to Citation Matching

Ben Wellner, Andrew McCallum, Fuchun Peng, Michael Hay

PDF

Open Access

TL;DR

This paper presents an integrated approach combining information extraction and coreference resolution using conditionally-trained graphical models, significantly improving citation matching accuracy by leveraging mutual information between tasks.

Contribution

It introduces a novel integrated inference framework that jointly models extraction and coreference, enhancing performance over separate systems.

Findings

01

Reduced error in citation matching

02

Improved extraction accuracy through coreference

03

Effective use of extraction uncertainty

Abstract

Although information extraction and coreference resolution appear together in many applications, most current systems perform them as ndependent steps. This paper describes an approach to integrated inference for extraction and coreference based on conditionally-trained undirected graphical models. We discuss the advantages of conditional probability training, and of a coreference model structure based on graph partitioning. On a data set of research paper citations, we show significant reduction in error by using extraction uncertainty to improve coreference citation matching accuracy, and using coreference to improve the accuracy of the extracted fields.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsData Quality and Management · Topic Modeling · Natural Language Processing Techniques