Scalable and Ethical Insider Threat Detection through Data Synthesis and Analysis by LLMs
Haywood Gelman, John D. Hastings

TL;DR
This paper explores using large language models to analyze and detect insider threat sentiment in job reviews, employing synthetic data to address ethical concerns and demonstrating promising alignment with human evaluations.
Contribution
It introduces a scalable, ethical approach for insider threat detection using LLM-generated synthetic data and compares its effectiveness against human assessments.
Findings
LLMs align well with human evaluations in threat sentiment detection
Synthetic data improves detection performance over human-generated data
Lower text diversity observed in synthetic datasets
Abstract
Insider threats wield an outsized influence on organizations, disproportionate to their small numbers. This is due to the internal access insiders have to systems, information, and infrastructure. %One example of this influence is where anonymous respondents submit web-based job search site reviews, an insider threat risk to organizations. Signals for such risks may be found in anonymous submissions to public web-based job search site reviews. This research studies the potential for large language models (LLMs) to analyze and detect insider threat sentiment within job site reviews. Addressing ethical data collection concerns, this research utilizes synthetic data generation using LLMs alongside existing job review datasets. A comparative analysis of sentiment scores generated by LLMs is benchmarked against expert human scoring. Findings reveal that LLMs demonstrate alignment with human…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsInformation and Cyber Security · Advanced Malware Detection Techniques · Digital and Cyber Forensics
