Constructing Flow Graphs from Procedural Cybersecurity Texts

Kuntal Kumar Pal; Kazuaki Kashihara; Pratyay Banerjee; Swaroop Mishra,; Ruoyu Wang; Chitta Baral

arXiv:2105.14357·cs.CL·June 1, 2021

Constructing Flow Graphs from Procedural Cybersecurity Texts

Kuntal Kumar Pal, Kazuaki Kashihara, Pratyay Banerjee, Swaroop Mishra,, Ruoyu Wang, Chitta Baral

PDF

1 Repo

TL;DR

This paper introduces a method to extract and visualize instruction flows from procedural texts, especially in cybersecurity, using a large annotated dataset and graph neural networks, improving understanding across multiple domains.

Contribution

It presents a new annotated dataset (CTFW) and a graph neural network approach for structure recovery from procedural texts, demonstrating cross-domain generalizability.

Findings

01

Graph Convolution Network with BERT outperforms BERT alone

02

Model achieves high accuracy in cybersecurity, maintenance, and cooking texts

03

The approach enables better visualization and reasoning of instruction flows

Abstract

Following procedural texts written in natural languages is challenging. We must read the whole text to identify the relevant information or identify the instruction flows to complete a task, which is prone to failures. If such texts are structured, we can readily visualize instruction-flows, reason or infer a particular step, or even build automated systems to help novice agents achieve a goal. However, this structure recovery task is a challenge because of such texts' diverse nature. This paper proposes to identify relevant information from such texts and generate information flows between sentences. We built a large annotated procedural text dataset (CTFW) in the cybersecurity domain (3154 documents). This dataset contains valuable instructions regarding software vulnerability analysis experiences. We performed extensive experiments on CTFW with our LM-GNN model variants in multiple…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

kuntalkumarpal/FlowGraph
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Multi-Head Attention · Attention Is All You Need · Linear Layer · Convolution · Linear Warmup With Linear Decay · Layer Normalization · Residual Connection · WordPiece · Dropout