ChemLabs on ChemO: A Multi-Agent System for Multimodal Reasoning on IChO 2025
Qiang Xu, Shengyuan Bai, Leqing Chen, Zijing Liu, Yu Li

TL;DR
This paper introduces ChemO, a challenging multimodal chemistry benchmark based on IChO 2025, and proposes ChemLabs, a multi-agent system that significantly improves automated chemical reasoning and problem-solving performance.
Contribution
The paper presents ChemO, a novel multimodal chemistry benchmark, and ChemLabs, a hierarchical multi-agent framework that enhances reasoning capabilities and achieves state-of-the-art results.
Findings
ChemLabs with SVE outperforms existing models significantly.
Achieved a score of 93.6, surpassing human gold medal threshold.
ChemO provides a new challenging dataset for chemical reasoning AI.
Abstract
Olympiad-level benchmarks in mathematics and physics are crucial testbeds for advanced AI reasoning, but chemistry, with its unique multimodal symbolic language, has remained an open challenge. We introduce ChemO, a new benchmark built from the International Chemistry Olympiad (IChO) 2025. ChemO features two key innovations for automated assessment: Assessment-Equivalent Reformulation (AER), which converts problems requiring visual outputs (e.g., drawing molecules) into computationally tractable formats, and Structured Visual Enhancement (SVE), a diagnostic mechanism to disentangle a model's visual perception capabilities from its core chemical reasoning. To tackle this benchmark, we propose ChemLabs, a hierarchical multi-agent framework that mimics human expert collaboration through specialized agents for problem decomposition, perception, reasoning, and auditing. Experiments on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsMachine Learning in Materials Science · Multimodal Machine Learning Applications · Advanced Graph Neural Networks
