Classifying Web Exploits with Topic Modeling

Jukka Ruohonen

arXiv:1710.05561·cs.CR·October 17, 2017

Classifying Web Exploits with Topic Modeling

Jukka Ruohonen

PDF

TL;DR

This paper demonstrates that topic modeling combined with database meta-data can effectively classify web exploits, achieving nearly 90% accuracy on a large dataset, highlighting the potential for semi-automatic exploit classification.

Contribution

It introduces a novel application of topic modeling and text mining for classifying web exploits, improving upon previous manual or less automated methods.

Findings

01

Achieved near 0.9 classification accuracy

02

Text mining significantly enhances classification performance

03

Potential for semi-automatic exploit classification in tracking systems

Abstract

This short empirical paper investigates how well topic modeling and database meta-data characteristics can classify web and other proof-of-concept (PoC) exploits for publicly disclosed software vulnerabilities. By using a dataset comprised of over 36 thousand PoC exploits, near a 0.9 accuracy rate is obtained in the empirical experiment. Text mining and topic modeling are a significant boost factor behind this classification performance. In addition to these empirical results, the paper contributes to the research tradition of enhancing software vulnerability information with text mining, providing also a few scholarly observations about the potential for semi-automatic classification of exploits in the existing tracking infrastructures.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.