Scalable Delphi: Large Language Models for Structured Risk Estimation
Tobias Lorenz, Mario Fritz

TL;DR
This paper introduces Scalable Delphi, a method using large language models to perform structured risk estimation efficiently, matching expert judgments and reducing assessment time from months to minutes.
Contribution
It adapts the Delphi method for LLMs with diverse personas and iterative refinement, enabling scalable and calibrated risk assessments in high-stakes domains.
Findings
LLM panels achieve high correlation with ground truth (r=0.87-0.95)
Elicitation time reduced from months to minutes
LLM-based elicitation aligns well with human expert panels
Abstract
Quantitative risk assessment in high-stakes domains relies on structured expert elicitation to estimate unobservable properties. The gold standard - the Delphi method - produces calibrated, auditable judgments but requires months of coordination and specialist time, placing rigorous risk assessment out of reach for most applications. We investigate whether Large Language Models (LLMs) can serve as scalable proxies for structured expert elicitation. We propose Scalable Delphi, adapting the classical protocol for LLMs with diverse expert personas, iterative refinement, and rationale sharing. Because target quantities are typically unobservable, we develop an evaluation framework based on necessary conditions: calibration against verifiable proxies, sensitivity to evidence, and alignment with human expert judgment. We evaluate in the domain of AI-augmented cybersecurity risk, using three…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsArtificial Intelligence in Healthcare and Education · Persona Design and Applications · Ethics and Social Impacts of AI
