A Consolidated System for Robust Multi-Document Entity Risk Extraction and Taxonomy Augmentation
Berk Ekmekci, Eleanor Hagerman, Blake Howald

TL;DR
This paper presents a hybrid system combining human input and automation for scalable extraction of entity-risk relations from large datasets, enhancing taxonomy coverage and extraction accuracy for risk analysis tasks.
Contribution
The paper introduces a simplified, scalable hybrid system for multi-document entity-risk extraction that improves coverage and accuracy through taxonomy expansion and token distance analysis.
Findings
Multi-sentence distance groups outperform single sentences in extraction accuracy.
Shorter, single-sentence extractions are preferred by analysts.
Taxonomy expansion increases relevant information but introduces more indirect relations.
Abstract
We introduce a hybrid human-automated system that provides scalable entity-risk relation extractions across large data sets. Given an expert-defined keyword taxonomy, entities, and data sources, the system returns text extractions based on bidirectional token distances between entities and keywords and expands taxonomy coverage with word vector encodings. Our system represents a more simplified architecture compared to alerting focused systems - motivated by high coverage use cases in the risk mining space such as due diligence activities and intelligence gathering. We provide an overview of the system and expert evaluations for a range of token distances. We demonstrate that single and multi-sentence distance groups significantly outperform baseline extractions with shorter, single sentences being preferred by analysts. As the taxonomy expands, the amount of relevant information…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Text Analysis Techniques · Data Quality and Management · Topic Modeling
