Synthesizing Precise Protocol Specs from Natural Language for Effective Test Generation

Kuangxiangzi Liu; Dhiman Chakraborty; Alexander Liggesmeyer; Andreas Zeller

arXiv:2511.17977·cs.SE·November 25, 2025

Synthesizing Precise Protocol Specs from Natural Language for Effective Test Generation

Kuangxiangzi Liu, Dhiman Chakraborty, Alexander Liggesmeyer, Andreas Zeller

PDF

Open Access

TL;DR

This paper presents a two-stage approach using large language models to convert natural language protocol specifications into formal, human-readable specs, enabling scalable and traceable test generation for safety-critical systems.

Contribution

It introduces a novel pipeline that extracts protocol elements from natural language and synthesizes formal specifications, improving traceability, human readability, and enabling automated test generation.

Findings

01

Achieves 92.8% recovery of client message types

02

Recovers 80.2% of server message types

03

Achieves 81.5% message acceptance in real-world tests

Abstract

Safety- and security-critical systems have to be thoroughly tested against their specifications. The state of practice is to have _natural language_ specifications, from which test cases are derived manually - a process that is slow, error-prone, and difficult to scale. _Formal_ specifications, on the other hand, are well-suited for automated test generation, but are tedious to write and maintain. In this work, we propose a two-stage pipeline that uses large language models (LLMs) to bridge the gap: First, we extract _protocol elements_ from natural-language specifications; second, leveraging a protocol implementation, we synthesize and refine a formal _protocol specification_ from these elements, which we can then use to massively test further implementations. We see this two-stage approach to be superior to end-to-end LLM-based test generation, as 1. it produces an _inspectable…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSoftware Testing and Debugging Techniques · Adversarial Robustness in Machine Learning · Web Application Security Vulnerabilities