Scalable and Ethical Insider Threat Detection through Data Synthesis and Analysis by LLMs

Haywood Gelman; John D. Hastings

arXiv:2502.07045·cs.CR·June 5, 2025

Scalable and Ethical Insider Threat Detection through Data Synthesis and Analysis by LLMs

Haywood Gelman, John D. Hastings

PDF

Open Access

TL;DR

This paper explores using large language models to analyze and detect insider threat sentiment in job reviews, employing synthetic data to address ethical concerns and demonstrating promising alignment with human evaluations.

Contribution

It introduces a scalable, ethical approach for insider threat detection using LLM-generated synthetic data and compares its effectiveness against human assessments.

Findings

01

LLMs align well with human evaluations in threat sentiment detection

02

Synthetic data improves detection performance over human-generated data

03

Lower text diversity observed in synthetic datasets

Abstract

Insider threats wield an outsized influence on organizations, disproportionate to their small numbers. This is due to the internal access insiders have to systems, information, and infrastructure. %One example of this influence is where anonymous respondents submit web-based job search site reviews, an insider threat risk to organizations. Signals for such risks may be found in anonymous submissions to public web-based job search site reviews. This research studies the potential for large language models (LLMs) to analyze and detect insider threat sentiment within job site reviews. Addressing ethical data collection concerns, this research utilizes synthetic data generation using LLMs alongside existing job review datasets. A comparative analysis of sentiment scores generated by LLMs is benchmarked against expert human scoring. Findings reveal that LLMs demonstrate alignment with human…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsInformation and Cyber Security · Advanced Malware Detection Techniques · Digital and Cyber Forensics