Classifying Web Exploits with Topic Modeling
Jukka Ruohonen

TL;DR
This paper demonstrates that topic modeling combined with database meta-data can effectively classify web exploits, achieving nearly 90% accuracy on a large dataset, highlighting the potential for semi-automatic exploit classification.
Contribution
It introduces a novel application of topic modeling and text mining for classifying web exploits, improving upon previous manual or less automated methods.
Findings
Achieved near 0.9 classification accuracy
Text mining significantly enhances classification performance
Potential for semi-automatic exploit classification in tracking systems
Abstract
This short empirical paper investigates how well topic modeling and database meta-data characteristics can classify web and other proof-of-concept (PoC) exploits for publicly disclosed software vulnerabilities. By using a dataset comprised of over 36 thousand PoC exploits, near a 0.9 accuracy rate is obtained in the empirical experiment. Text mining and topic modeling are a significant boost factor behind this classification performance. In addition to these empirical results, the paper contributes to the research tradition of enhancing software vulnerability information with text mining, providing also a few scholarly observations about the potential for semi-automatic classification of exploits in the existing tracking infrastructures.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
