Evaluating the Reliability of Multiple Large Language Models in Risk Assessment: A CIS Controls Based Approach
Gustavo Roberto Pinto, Arthur do Prado Labaki, Rodrigo Sanches Miani

TL;DR
This study evaluates the performance of large language models in cybersecurity risk assessment, emphasizing the importance of human oversight due to their tendency to underestimate risks.
Contribution
It compares LLMs with human experts in cybersecurity risk assessment, highlighting the necessity of integrating human validation for reliable results.
Findings
LLMs consistently underestimate cybersecurity risks compared to humans.
Human oversight is crucial to ensure accurate risk assessments.
LLMs should be used as complementary tools, not standalone assessors.
Abstract
Proper implementation of technical and administrative controls reinforces an organization's cybersecurity posture and business resilience, reduces risks, and enhances governance, ultimately elevating business maturity. The dynamics of the technological landscape and emerging threats negatively affect the most diverse companies, regardless of their size. This, associated with a global gap in the cybersecurity workforce, imposes enormous challenges and the need for a profound change in how companies respond to threats. Generative Artificial Intelligence from large language models has become an influential tool across various companies, emerging as a viable option to help address those challenges while partially addressing the shortage of skilled labor. Although large language models can help in this scenario, there may be risks, such as generating unreliable or 'hallucinated' content,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
