STREAM (ChemBio): A Standard for Transparently Reporting Evaluations in AI Model Reports

Tegan McCaslin; Jide Alaga; Samira Nedungadi; Seth Donoughe; Tom Reed; Rishi Bommasani; Chris Painter; and Luca Righetti

arXiv:2508.09853·cs.CY·September 4, 2025

STREAM (ChemBio): A Standard for Transparently Reporting Evaluations in AI Model Reports

Tegan McCaslin, Jide Alaga, Samira Nedungadi, Seth Donoughe, Tom Reed, Rishi Bommasani, Chris Painter, and Luca Righetti

PDF

TL;DR

STREAM is a standard designed to improve transparency and clarity in reporting AI evaluation results, especially for chemical and biological benchmarks, to foster trust and assess evaluation rigor.

Contribution

It introduces a practical reporting standard with best practices and templates for transparent disclosure of AI evaluation results in ChemBio contexts.

Findings

01

Standard improves clarity in model reports

02

Template facilitates easier adoption by developers

03

Enhances trust through transparency in evaluations

Abstract

Evaluations of dangerous AI capabilities are important for managing catastrophic risks. Public transparency into these evaluations - including what they test, how they are conducted, and how their results inform decisions - is crucial for building trust in AI development. We propose STREAM (A Standard for Transparently Reporting Evaluations in AI Model Reports), a standard to improve how model reports disclose evaluation results, initially focusing on chemical and biological (ChemBio) benchmarks. Developed in consultation with 23 experts across government, civil society, academia, and frontier AI companies, this standard is designed to (1) be a practical resource to help AI developers present evaluation results more clearly, and (2) help third parties identify whether model reports provide sufficient detail to assess the rigor of the ChemBio evaluations. We concretely demonstrate our…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.