ToxReason: A Benchmark for Mechanistic Chemical Toxicity Reasoning via Adverse Outcome Pathway

Jueon Park; Wonjune Jang; Chanhwi Kim; Yein Park; Jaewoo Kang

arXiv:2604.06264·q-bio.QM·April 9, 2026

ToxReason: A Benchmark for Mechanistic Chemical Toxicity Reasoning via Adverse Outcome Pathway

Jueon Park, Wonjune Jang, Chanhwi Kim, Yein Park, Jaewoo Kang

PDF

TL;DR

ToxReason is a new benchmark based on the Adverse Outcome Pathway framework that evaluates the ability of models to perform mechanistic reasoning for chemical toxicity prediction across multiple organs.

Contribution

It introduces ToxReason, a benchmark that assesses both toxicity prediction and mechanistic reasoning grounded in biological pathways, filling a gap in existing evaluation methods.

Findings

01

Strong predictive performance does not guarantee reliable mechanistic reasoning.

02

Reasoning-aware training enhances both mechanistic understanding and toxicity prediction.

03

ToxReason reveals the need for integrating reasoning into model evaluation and training.

Abstract

Recent advances in large language models (LLMs) have enabled molecular reasoning for property prediction. However, toxicity arises from complex biological mechanisms beyond chemical structure, necessitating mechanistic reasoning for reliable prediction. Despite its importance, current benchmarks fail to systematically evaluate this capability. LLMs can generate fluent but biologically unfaithful explanations, making it difficult to assess whether predicted toxicities are grounded invalid mechanisms. To bridge this gap, we introduce ToxReason, a benchmark grounded in the Adverse Outcome Pathway (AOP) that evaluates organ-level toxicity reasoning across multiple organs. ToxReason integrates experimental drug-target interaction evidence with toxicity labels, requiring models to infer both toxic outcomes and their underlying mechanisms from Molecular Initiating Event (MIE) to Adverse…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.