Monte Carlo Expected Threat (MOCET) Scoring

Joseph Kim; Saahith Potluri

arXiv:2511.16823·cs.LG·November 24, 2025

Monte Carlo Expected Threat (MOCET) Scoring

Joseph Kim, Saahith Potluri

PDF

Open Access

TL;DR

This paper introduces MOCET, a novel, scalable metric designed to evaluate and quantify real-world risks associated with AI models, particularly in biosecurity, aiding stakeholders in safety assessments.

Contribution

The paper presents MOCET, an interpretable and scalable metric that enhances risk evaluation for AI safety, addressing limitations of existing benchmarks.

Findings

01

MOCET effectively quantifies real-world risks.

02

MOCET is scalable and open-ended.

03

MOCET improves safety assessments for AI models.

Abstract

Evaluating and measuring AI Safety Level (ASL) threats are crucial for guiding stakeholders to implement safeguards that keep risks within acceptable limits. ASL-3+ models present a unique risk in their ability to uplift novice non-state actors, especially in the realm of biosecurity. Existing evaluation metrics, such as LAB-Bench, BioLP-bench, and WMDP, can reliably assess model uplift and domain knowledge. However, metrics that better contextualize "real-world risks" are needed to inform the safety case for LLMs, along with scalable, open-ended metrics to keep pace with their rapid advancements. To address both gaps, we introduce MOCET, an interpretable and doubly-scalable metric (automatable and open-ended) that can quantify real-world risks.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Safety Systems Engineering in Autonomy · Information and Cyber Security