Automatic Knowledge Extraction with Human Interface

Steve Schmidt; Denley Lam; Patrick Hayden

arXiv:2104.04415·cs.HC·April 12, 2021

Automatic Knowledge Extraction with Human Interface

Steve Schmidt, Denley Lam, Patrick Hayden

PDF

Open Access

TL;DR

OrbWeaver is an automated system that combines natural language processing and a human interface to efficiently extract and model knowledge from documentation, demonstrated on cyber threat documents.

Contribution

It introduces OrbWeaver, a scalable and extensible system that improves knowledge extraction from complex documents using open source tools and a web-based interface.

Findings

01

Enhanced detection of hidden relationships in documents

02

Effective linking of co-related entities

03

Improved evidence gathering in knowledge extraction

Abstract

OrbWeaver, an automatic knowledge extraction system paired with a human interface, streamlines the use of unintuitive natural language processing software for modeling systems from their documentation. OrbWeaver enables the indirect transfer of knowledge about legacy systems by leveraging open source tools in document understanding and processing as well as using web based user interface constructs. By design, OrbWeaver is scalable, extensible, and usable; we demonstrate its utility by evaluating its performance in processing a corpus of documents related to advanced persistent threats in the cyber domain. The results indicate better knowledge extraction by revealing hidden relationships, linking co-related entities, and gathering evidence.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSoftware Engineering Research · Data Quality and Management · Topic Modeling