PolyReal: A Benchmark for Real-World Polymer Science Workflows
Wanhao Liu, Weida Wang, Jiaqing Xie, Suorong Yang, Jue Wang, Benteng Chen, Guangtao Mei, Zonglin Yang, Shufei Zhang, Yuchun Mo, Lang Cheng, Jin Zeng, Houqiang Li, Wanli Ouyang, and Yuqiang Li

TL;DR
PolyReal is a comprehensive benchmark designed to evaluate multimodal language models across the full spectrum of real-world polymer science workflows, highlighting strengths and gaps in current AI capabilities.
Contribution
It introduces a novel, practice-grounded benchmark covering five key aspects of polymer experimentation to systematically assess MLLMs in real-world scientific tasks.
Findings
Models excel at knowledge-based reasoning but struggle with practical tasks like safety analysis and data extraction.
There is a significant capability gap between theoretical understanding and practical application in MLLMs.
PolyReal reveals the need for improved models to handle real-world scientific workflows.
Abstract
Multimodal Large Language Models (MLLMs) excel in general domains but struggle with complex, real-world science. We posit that polymer science, an interdisciplinary field spanning chemistry, physics, biology, and engineering, is an ideal high-stakes testbed due to its diverse multimodal data. Yet, existing benchmarks related to polymer science largely overlook real-world workflows, limiting their practical utility and failing to systematically evaluate MLLMs across the full, practice-grounded lifecycle of experimentation. We introduce PolyReal, a novel multimodal benchmark grounded in real-world scientific practices to evaluate MLLMs on the full lifecycle of polymer experimentation. It covers five critical capabilities: (1) foundational knowledge application; (2) lab safety analysis; (3) experiment mechanism reasoning; (4) raw data extraction; and (5) performance & application…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
