SPEC5G: A Dataset for 5G Cellular Network Protocol Analysis
Imtiaz Karim, Kazi Samin Mubasshir, Mirza Masfiqur Rahman, and Elisa, Bertino

TL;DR
This paper introduces SPEC5G, a large-scale dataset of 5G protocol texts designed to facilitate NLP-based analysis, security assessment, and summarization, thereby automating aspects of protocol development and security analysis.
Contribution
The paper presents the first public 5G dataset for NLP research, enabling automated security analysis and protocol summarization using large-scale language models.
Findings
Effective security-related text classification using the dataset
Successful protocol summarization to aid understanding
Demonstrated value in automating 5G protocol analysis
Abstract
5G is the 5th generation cellular network protocol. It is the state-of-the-art global wireless standard that enables an advanced kind of network designed to connect virtually everyone and everything with increased speed and reduced latency. Therefore, its development, analysis, and security are critical. However, all approaches to the 5G protocol development and security analysis, e.g., property extraction, protocol summarization, and semantic analysis of the protocol specifications and implementations are completely manual. To reduce such manual effort, in this paper, we curate SPEC5G the first-ever public 5G dataset for NLP research. The dataset contains 3,547,586 sentences with 134M words, from 13094 cellular network specifications and 13 online websites. By leveraging large-scale pre-trained language models that have achieved state-of-the-art results on NLP tasks, we use this…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHate Speech and Cyberbullying Detection · Authorship Attribution and Profiling
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
