OrgForge-IT: A Verifiable Synthetic Benchmark for LLM-Based Insider Threat Detection

Jeffrey Flynt

arXiv:2603.22499·cs.CR·March 25, 2026

OrgForge-IT: A Verifiable Synthetic Benchmark for LLM-Based Insider Threat Detection

Jeffrey Flynt

PDF

Open Access 1 Datasets

TL;DR

OrgForge-IT introduces a verifiable synthetic benchmark for insider threat detection using a deterministic simulation engine, enabling consistent ground truth and realistic detection scenario evaluation for LLM-based models.

Contribution

It presents a novel, verifiable synthetic benchmark with architectural guarantees for cross-artifact consistency, addressing limitations of existing static datasets and enabling comprehensive threat detection evaluation.

Findings

01

Models show varied verdict accuracy despite similar triage performance.

02

False-positive rates significantly impact verdict accuracy and model noise resilience.

03

Victim attribution distinguishes threat detection tiers and informs response strategies.

Abstract

Synthetic insider threat benchmarks face a consistency problem: corpora generated without an external factual constraint cannot rule out cross-artifact contradictions. The CERT dataset -- the field's canonical benchmark -- is also static, lacks cross-surface correlation scenarios, and predates the LLM era. We present OrgForge-IT, a verifiable synthetic benchmark in which a deterministic simulation engine maintains ground truth and language models generate only surface prose, making cross-artifact consistency an architectural guarantee. The corpus spans 51 simulated days, 2,904 telemetry records at a 96.4% noise rate, and four detection scenarios designed to defeat single-surface and single-day triage strategies across three threat classes and eight injectable behaviors. A ten-model leaderboard reveals several findings: (1) triage and verdict accuracy dissociate - eight models achieve…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

aeriesec/orgforge-insider-threat
dataset· 70 dl
70 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNetwork Security and Intrusion Detection · Adversarial Robustness in Machine Learning · Information and Cyber Security