Extracting Procedural Knowledge from Technical Documents

Shivali Agarwal; Shubham Atreja; Vikas Agarwal

arXiv:2010.10156·cs.AI·October 21, 2020

Extracting Procedural Knowledge from Technical Documents

Shivali Agarwal, Shubham Atreja, Vikas Agarwal

PDF

Open Access

TL;DR

This paper presents a method for automatically identifying and extracting multiple procedures from complex technical documents using structural and linguistic analysis, achieving high accuracy.

Contribution

It introduces a novel classification-based approach leveraging document features to extract procedures from diverse formats, improving over unsupervised methods.

Findings

01

Achieved 89% accuracy in procedure identification

02

Unsupervised techniques provided initial insights but had limitations

03

Supervised approach significantly improved extraction performance

Abstract

Procedures are an important knowledge component of documents that can be leveraged by cognitive assistants for automation, question-answering or driving a conversation. It is a challenging problem to parse big dense documents like product manuals, user guides to automatically understand which parts are talking about procedures and subsequently extract them. Most of the existing research has focused on extracting flows in given procedures or understanding the procedures in order to answer conceptual questions. Identifying and extracting multiple procedures automatically from documents of diverse formats remains a relatively less addressed problem. In this work, we cover some of this ground by -- 1) Providing insights on how structural and linguistic properties of documents can be grouped to define types of procedures, 2) Analyzing documents to extract the relevant linguistic and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Semantic Web and Ontologies