Beyond Words: On Large Language Models Actionability in Mission-Critical   Risk Analysis

Matteo Esposito; Francesco Palagiano; Valentina Lenarduzzi; Davide; Taibi

arXiv:2406.10273·cs.CL·September 10, 2024

Beyond Words: On Large Language Models Actionability in Mission-Critical Risk Analysis

Matteo Esposito, Francesco Palagiano, Valentina Lenarduzzi, Davide, Taibi

PDF

Open Access

TL;DR

This study evaluates the effectiveness of large language models, especially Retrieval-Augmented Generation and fine-tuned versions, in risk analysis, showing they can be faster and uncover hidden risks despite slightly lower accuracy than human experts.

Contribution

It provides the first empirical comparison of LLMs with human experts in mission-critical risk analysis, highlighting their strengths and optimal use cases.

Findings

01

RAG-assisted LLMs have lowest hallucination rates

02

LLMs are quicker and more actionable than humans

03

Human experts outperform LLMs in accuracy

Abstract

Context. Risk analysis assesses potential risks in specific scenarios. Risk analysis principles are context-less; the same methodology can be applied to a risk connected to health and information technology security. Risk analysis requires a vast knowledge of national and international regulations and standards and is time and effort-intensive. A large language model can quickly summarize information in less time than a human and can be fine-tuned to specific tasks. Aim. Our empirical study aims to investigate the effectiveness of Retrieval-Augmented Generation and fine-tuned LLM in risk analysis. To our knowledge, no prior study has explored its capabilities in risk analysis. Method. We manually curated 193 unique scenarios leading to 1283 representative samples from over 50 mission-critical analyses archived by the industrial context team in the last five years. We compared the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRisk and Safety Analysis · Software Reliability and Analysis Research · Software Engineering Techniques and Practices

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · 15 Ways to Contact How can i speak to someone at Delta Airlines · WordPiece · Label Smoothing · Linear Warmup With Linear Decay · Position-Wise Feed-Forward Layer · Linear Layer · Absolute Position Encodings · Cosine Annealing · Multi-Head Attention