Extracting Procedural Knowledge from Technical Documents
Shivali Agarwal, Shubham Atreja, Vikas Agarwal

TL;DR
This paper presents a method for automatically identifying and extracting multiple procedures from complex technical documents using structural and linguistic analysis, achieving high accuracy.
Contribution
It introduces a novel classification-based approach leveraging document features to extract procedures from diverse formats, improving over unsupervised methods.
Findings
Achieved 89% accuracy in procedure identification
Unsupervised techniques provided initial insights but had limitations
Supervised approach significantly improved extraction performance
Abstract
Procedures are an important knowledge component of documents that can be leveraged by cognitive assistants for automation, question-answering or driving a conversation. It is a challenging problem to parse big dense documents like product manuals, user guides to automatically understand which parts are talking about procedures and subsequently extract them. Most of the existing research has focused on extracting flows in given procedures or understanding the procedures in order to answer conceptual questions. Identifying and extracting multiple procedures automatically from documents of diverse formats remains a relatively less addressed problem. In this work, we cover some of this ground by -- 1) Providing insights on how structural and linguistic properties of documents can be grouped to define types of procedures, 2) Analyzing documents to extract the relevant linguistic and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Semantic Web and Ontologies
