Semi-Automated Protocol Disambiguation and Code Generation
Jane Yen, Tam\'as L\'evai, Qinyuan Ye, Xiang Ren, Ramesh, Govindan, Barath Raghavan

TL;DR
This paper introduces SAGE, a system that uses NLP to identify ambiguities in protocol specifications and automatically generate interoperable code after clarifications, improving protocol implementation accuracy.
Contribution
SAGE is a novel NLP-based system that detects ambiguities in protocol specs and automates code generation, enhancing protocol implementation reliability.
Findings
Identified 5 ambiguities and 6 under-specifications in ICMP RFC.
SAGE can generate interoperable code for ICMP, BFD, IGMP, and NTP.
Potential to extend to TCP and BGP protocols.
Abstract
For decades, Internet protocols have been specified using natural language. Given the ambiguity inherent in such text, it is not surprising that protocol implementations have long exhibited bugs. In this paper, we apply natural language processing (NLP) to effect semi-automated generation of protocol implementations from specification text. Our system, SAGE, can uncover ambiguous or under-specified sentences in specifications; once these are clarified by the spec author, SAGE can generate protocol code automatically. Using SAGE, we discover 5 instances of ambiguity and 6 instances of under-specification in the ICMP RFC; after clarification, SAGE is able to automatically generate code that interoperates perfectly with Linux implementations. We show that SAGE generalizes to BFD, IGMP, and NTP. We also find that SAGE supports many of the conceptual components found in key protocols,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
