LLMs Perform Poorly at Concept Extraction in Cyber-security Research   Literature

Maxime W\"ursch; Andrei Kucharavy; Dimitri Percia David and; Alain Mermoud

arXiv:2312.07110·cs.CL·December 13, 2023·1 cites

LLMs Perform Poorly at Concept Extraction in Cyber-security Research Literature

Maxime W\"ursch, Andrei Kucharavy, Dimitri Percia David and, Alain Mermoud

PDF

Open Access

TL;DR

This study evaluates the effectiveness of large language models in extracting cybersecurity knowledge entities, revealing their limitations and proposing a statistical noun extractor to better track emerging trends in the rapidly evolving domain.

Contribution

The paper introduces a statistical noun extractor to improve entity recognition in cybersecurity texts and assesses LLMs' performance in trend analysis within this domain.

Findings

01

LLMs perform poorly at cybersecurity concept extraction

02

The noun extractor shows potential for identifying relevant domain-specific nouns

03

Limitations exist in using LLMs for trend detection in cybersecurity

Abstract

The cybersecurity landscape evolves rapidly and poses threats to organizations. To enhance resilience, one needs to track the latest developments and trends in the domain. It has been demonstrated that standard bibliometrics approaches show their limits in such a fast-evolving domain. For this purpose, we use large language models (LLMs) to extract relevant knowledge entities from cybersecurity-related texts. We use a subset of arXiv preprints on cybersecurity as our data and compare different LLMs in terms of entity recognition (ER) and relevance. The results suggest that LLMs do not produce good knowledge entities that reflect the cybersecurity context, but our results show some potential for noun extractors. For this reason, we developed a noun extractor boosted with some statistical analysis to extract specific and relevant compound nouns from the domain. Later, we tested our model…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsInformation and Cyber Security · Cybercrime and Law Enforcement Studies · Data Quality and Management