Monte Carlo Expected Threat (MOCET) Scoring
Joseph Kim, Saahith Potluri

TL;DR
This paper introduces MOCET, a novel, scalable metric designed to evaluate and quantify real-world risks associated with AI models, particularly in biosecurity, aiding stakeholders in safety assessments.
Contribution
The paper presents MOCET, an interpretable and scalable metric that enhances risk evaluation for AI safety, addressing limitations of existing benchmarks.
Findings
MOCET effectively quantifies real-world risks.
MOCET is scalable and open-ended.
MOCET improves safety assessments for AI models.
Abstract
Evaluating and measuring AI Safety Level (ASL) threats are crucial for guiding stakeholders to implement safeguards that keep risks within acceptable limits. ASL-3+ models present a unique risk in their ability to uplift novice non-state actors, especially in the realm of biosecurity. Existing evaluation metrics, such as LAB-Bench, BioLP-bench, and WMDP, can reliably assess model uplift and domain knowledge. However, metrics that better contextualize "real-world risks" are needed to inform the safety case for LLMs, along with scalable, open-ended metrics to keep pace with their rapid advancements. To address both gaps, we introduce MOCET, an interpretable and doubly-scalable metric (automatable and open-ended) that can quantify real-world risks.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Safety Systems Engineering in Autonomy · Information and Cyber Security
