SecGoal: A Benchmark for Security Goal Extraction and Formalization from Protocol Documents
Dawei Huang, Hui Li, Haonan Feng, Jingjing Guan, Yueshuang Jiao, Bo Jia (Beijing University of Posts, Telecommunications)

TL;DR
SecGoal introduces a benchmark and framework for extracting and formalizing security goals from protocol documents, demonstrating that instruction-tuned models significantly outperform larger general models in this task.
Contribution
This work provides the first expert-annotated benchmark for security goal extraction and a specialized AI-assisted framework for formalization from protocol texts.
Findings
Frontier models have high recall but low precision in security goal classification.
Instruction tuning on SecGoal improves small model F1-scores above 80%.
The dataset and baseline facilitate future automated formal protocol analysis.
Abstract
Formal verification provides rigorous guarantees for cryptographic security, yet automating the extraction and formalization of security goals from natural language protocol documents remains a major bottleneck, compounded by the scarcity of expert-annotated resources and integrated frameworks bridging unstructured text and symbolic logic. We introduce SecGoal, the first expert-annotated benchmark covering 15 widely deployed protocol documents, including 5G-AKA and TLS 1.3, and AIFG, an AI-assisted framework that decomposes the task into context-aware goal extraction and retrieval-augmented formalization. We conduct a comprehensive evaluation to assess whether contemporary LLMs are ready to automate this pipeline. Our results reveal a pronounced precision-recall imbalance: frontier models, such as Gemini 2.5-Pro, achieve high recall but precision below 15%, frequently misclassifying…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
