Threat Modelling using Domain-Adapted Language Models: Empirical Evaluation and Insights
Saba Pourhanifeh, AbdulAziz AbdulGhaffar, Ashraf Matrawy

TL;DR
This paper systematically evaluates domain-adapted and general-purpose language models for structured threat modelling in 5G security, revealing limitations of current LLMs and emphasizing the need for task-specific reasoning.
Contribution
It provides a comprehensive empirical analysis of 52 configurations across 8 models, highlighting the impact of domain adaptation, model size, decoding, and prompting on threat classification performance.
Findings
Domain-adapted models do not consistently outperform general models.
Decoding strategies significantly influence output validity.
Larger models offer higher performance but with inconsistent gains.
Abstract
Large Language Models(LLMs) are increasingly explored for cybersecurity applications such as vulnerability detection. In the domain of threat modelling, prior work has primarily evaluated a number of general-purpose Large Language Models under limited prompting settings. In this study, we extend the research area of structured threat modelling by systematically evaluating domain-adapted language models of different sizes to their general counterparts. We use both LLMs and Small Language Models(SLMs) that were domain adapted to telecommunications and cybersecuirty. For the structured threat modelling, we selected the widely used STRIDE approach and the application area is 5G security. We present a comprehensive empirical evaluation using 52 different configurations (on 8 different language models) to analyze the impact of 1) domain adaptation, 2) model scale, 3) decoding strategies…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
