Automatic Knowledge Extraction with Human Interface
Steve Schmidt, Denley Lam, Patrick Hayden

TL;DR
OrbWeaver is an automated system that combines natural language processing and a human interface to efficiently extract and model knowledge from documentation, demonstrated on cyber threat documents.
Contribution
It introduces OrbWeaver, a scalable and extensible system that improves knowledge extraction from complex documents using open source tools and a web-based interface.
Findings
Enhanced detection of hidden relationships in documents
Effective linking of co-related entities
Improved evidence gathering in knowledge extraction
Abstract
OrbWeaver, an automatic knowledge extraction system paired with a human interface, streamlines the use of unintuitive natural language processing software for modeling systems from their documentation. OrbWeaver enables the indirect transfer of knowledge about legacy systems by leveraging open source tools in document understanding and processing as well as using web based user interface constructs. By design, OrbWeaver is scalable, extensible, and usable; we demonstrate its utility by evaluating its performance in processing a corpus of documents related to advanced persistent threats in the cyber domain. The results indicate better knowledge extraction by revealing hidden relationships, linking co-related entities, and gathering evidence.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Data Quality and Management · Topic Modeling
