A Library Perspective on Nearly-Unsupervised Information Extraction   Workflows in Digital Libraries

Hermann Kroll; Jan Pirklbauer; Florian Pl\"otzky; Wolf-Tilo; Balke

arXiv:2205.00716·cs.CL·May 3, 2022

A Library Perspective on Nearly-Unsupervised Information Extraction Workflows in Digital Libraries

Hermann Kroll, Jan Pirklbauer, Florian Pl\"otzky, Wolf-Tilo, Balke

PDF

1 Repo

TL;DR

This paper examines the challenges and opportunities of implementing nearly-unsupervised information extraction workflows in digital libraries, analyzing case studies across various domains to assess quality and practical handling.

Contribution

It provides an analysis of unsupervised extraction workflows in digital libraries, highlighting opportunities, limitations, and best practices based on case studies.

Findings

01

Unsupervised extraction can be effective but produces non-canonicalized results.

02

Domain-specific data influences extraction quality.

03

Best practices can improve workflow reliability.

Abstract

Information extraction can support novel and effective access paths for digital libraries. Nevertheless, designing reliable extraction workflows can be cost-intensive in practice. On the one hand, suitable extraction methods rely on domain-specific training data. On the other hand, unsupervised and open extraction methods usually produce not-canonicalized extraction results. This paper tackles the question how digital libraries can handle such extractions and if their quality is sufficient in practice. We focus on unsupervised extraction workflows by analyzing them in case studies in the domains of encyclopedias (Wikipedia), pharmacy and political sciences. We report on opportunities and limitations. Finally we discuss best practices for unsupervised extraction workflows.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

hermannkroll/kgextractiontoolbox
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.