Optimising Human-Machine Collaboration for Efficient High-Precision Information Extraction from Text Documents
Bradley Butcher, Miri Zilka, Darren Cook, Jiri Hron, Adrian Weller

TL;DR
This paper proposes a human-in-the-loop framework for high-precision information extraction from text, combining machine speed with human validation to outperform automated methods in accuracy and efficiency.
Contribution
It introduces a novel framework and tool for weak-supervision labelling with human validation, demonstrating improved precision and efficiency in criminal justice datasets.
Findings
Human-in-the-loop approach achieves manual annotation precision with less time.
Outperforms fully automated baselines in precision.
Effective on criminal justice datasets.
Abstract
While humans can extract information from unstructured text with high precision and recall, this is often too time-consuming to be practical. Automated approaches, on the other hand, produce nearly-immediate results, but may not be reliable enough for high-stakes applications where precision is essential. In this work, we consider the benefits and drawbacks of various human-only, human-machine, and machine-only information extraction approaches. We argue for the utility of a human-in-the-loop approach in applications where high precision is required, but purely manual extraction is infeasible. We present a framework and an accompanying tool for information extraction using weak-supervision labelling with human validation. We demonstrate our approach on three criminal justice datasets. We find that the combination of computer speed and human understanding yields precision comparable to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDigital and Cyber Forensics · Advanced Malware Detection Techniques · Mobile Crowdsensing and Crowdsourcing
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
