Optimising Human-Machine Collaboration for Efficient High-Precision   Information Extraction from Text Documents

Bradley Butcher; Miri Zilka; Darren Cook; Jiri Hron; Adrian Weller

arXiv:2302.09324·cs.CL·February 21, 2023

Optimising Human-Machine Collaboration for Efficient High-Precision Information Extraction from Text Documents

Bradley Butcher, Miri Zilka, Darren Cook, Jiri Hron, Adrian Weller

PDF

Open Access

TL;DR

This paper proposes a human-in-the-loop framework for high-precision information extraction from text, combining machine speed with human validation to outperform automated methods in accuracy and efficiency.

Contribution

It introduces a novel framework and tool for weak-supervision labelling with human validation, demonstrating improved precision and efficiency in criminal justice datasets.

Findings

01

Human-in-the-loop approach achieves manual annotation precision with less time.

02

Outperforms fully automated baselines in precision.

03

Effective on criminal justice datasets.

Abstract

While humans can extract information from unstructured text with high precision and recall, this is often too time-consuming to be practical. Automated approaches, on the other hand, produce nearly-immediate results, but may not be reliable enough for high-stakes applications where precision is essential. In this work, we consider the benefits and drawbacks of various human-only, human-machine, and machine-only information extraction approaches. We argue for the utility of a human-in-the-loop approach in applications where high precision is required, but purely manual extraction is infeasible. We present a framework and an accompanying tool for information extraction using weak-supervision labelling with human validation. We demonstrate our approach on three criminal justice datasets. We find that the combination of computer speed and human understanding yields precision comparable to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDigital and Cyber Forensics · Advanced Malware Detection Techniques · Mobile Crowdsensing and Crowdsourcing

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings