OrgForge: A Multi-Agent Simulation Framework for Verifiable Synthetic Corporate Corpora

Jeffrey Flynt

arXiv:2603.14997·cs.CL·April 10, 2026

OrgForge: A Multi-Agent Simulation Framework for Verifiable Synthetic Corporate Corpora

Jeffrey Flynt

PDF

1 Datasets

TL;DR

OrgForge is an open-source multi-agent simulation framework that creates verifiable synthetic organizational data by modeling processes rather than documents, improving fidelity over LLM-generated corpora.

Contribution

It introduces a deterministic simulation engine enforcing a physics-cognition boundary, generating traceable organizational artifacts without hallucination artifacts.

Findings

01

Achieves 0.46 improvement in prose-to-ground-truth fidelity over LLM baselines.

02

Simulates organizational processes producing cross-artifact causal cascades.

03

Identifies a hallucination failure mode in chained LLM document generation.

Abstract

Building and evaluating enterprise AI systems requires synthetic organizational corpora that are internally consistent, temporally structured, and cross-artifact traceable. Existing corpora either carry legal constraints or inherit hallucination artifacts from the generating LLMs, silently corrupting results when timestamps or facts contradict across documents and reinforcing those errors during training. We present OrgForge, an open-source multi-agent simulation framework that enforces a strict physics-cognition boundary: a deterministic Python engine maintains a SimEvent ground-truth bus while LLMs generate only surface prose. OrgForge simulates the organizational processes that produce documents, not the documents themselves. Engineers leave mid-sprint, triggering incident handoffs and CRM ownership lapses. Knowledge gaps emerge when under-documented systems break and recover through…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

aeriesec/orgforge
dataset· 457 dl
457 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.