Unleashing the Power of LLM to Infer State Machine from the Protocol   Implementation

Haiyang Wei; Ligeng Chen; Zhengjie Du; Yuhan Wu; Haohui Huang; Yue; Liu; Guang Cheng; Fengyuan Xu; Linzhang Wang; Bing Mao

arXiv:2405.00393·cs.CR·March 28, 2025

Unleashing the Power of LLM to Infer State Machine from the Protocol Implementation

Haiyang Wei, Ligeng Chen, Zhengjie Du, Yuhan Wu, Haohui Huang, Yue, Liu, Guang Cheng, Fengyuan Xu, Linzhang Wang, Bing Mao

PDF

Open Access 1 Repo

TL;DR

This paper introduces ProtocolGPT, a novel approach using Large Language Models with retrieval augmented generation to accurately infer state machines from protocol implementation source code, overcoming limitations of traditional analysis methods.

Contribution

It presents the first LLM-based state machine inference method leveraging source code, achieving high precision and improved vulnerability detection in protocol analysis.

Findings

01

Over 90% precision in state machine inference

02

Outperforms baselines by over 30% in accuracy

03

Uncovered two 0-day vulnerabilities with enhanced fuzzing

Abstract

State machines are essential for enhancing protocol analysis to identify vulnerabilities. However, inferring state machines from network protocol implementations is challenging due to complex code syntax and semantics. Traditional dynamic analysis methods often miss critical state transitions due to limited coverage, while static analysis faces path explosion issues. To overcome these challenges, we introduce a novel state machine inference approach utilizing Large Language Models (LLMs), named ProtocolGPT. This method employs retrieval augmented generation technology to enhance a pre-trained model with specific knowledge from protocol implementations. Through effective prompt engineering, we accurately identify and infer state machines. To the best of our knowledge, our approach represents the first state machine inference that leverages the source code of protocol implementations. Our…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

s1awwhy/protocolgpt
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Algorithms