LLMs Perform Poorly at Concept Extraction in Cyber-security Research Literature
Maxime W\"ursch, Andrei Kucharavy, Dimitri Percia David and, Alain Mermoud

TL;DR
This study evaluates the effectiveness of large language models in extracting cybersecurity knowledge entities, revealing their limitations and proposing a statistical noun extractor to better track emerging trends in the rapidly evolving domain.
Contribution
The paper introduces a statistical noun extractor to improve entity recognition in cybersecurity texts and assesses LLMs' performance in trend analysis within this domain.
Findings
LLMs perform poorly at cybersecurity concept extraction
The noun extractor shows potential for identifying relevant domain-specific nouns
Limitations exist in using LLMs for trend detection in cybersecurity
Abstract
The cybersecurity landscape evolves rapidly and poses threats to organizations. To enhance resilience, one needs to track the latest developments and trends in the domain. It has been demonstrated that standard bibliometrics approaches show their limits in such a fast-evolving domain. For this purpose, we use large language models (LLMs) to extract relevant knowledge entities from cybersecurity-related texts. We use a subset of arXiv preprints on cybersecurity as our data and compare different LLMs in terms of entity recognition (ER) and relevance. The results suggest that LLMs do not produce good knowledge entities that reflect the cybersecurity context, but our results show some potential for noun extractors. For this reason, we developed a noun extractor boosted with some statistical analysis to extract specific and relevant compound nouns from the domain. Later, we tested our model…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsInformation and Cyber Security · Cybercrime and Law Enforcement Studies · Data Quality and Management
