ChEmREF: Evaluating Language Model Readiness for Chemical Emergency Response
Risha Surana, Qinyuan Ye, Swabha Swayamdipta

TL;DR
This paper introduces ChEmREF, a comprehensive benchmark to evaluate language models' ability to assist in chemical emergency response tasks, highlighting current capabilities and limitations.
Contribution
The paper presents ChEmREF, a new benchmark with tasks for chemical representation, emergency response, and knowledge question answering, to assess language models in HAZMAT scenarios.
Findings
Models achieved 68.0% accuracy in chemical representation translation.
Models scored 52.7% on incident response recommendations.
Models reached 63.9% accuracy on chemical safety exams.
Abstract
Emergency responders managing hazardous material HAZMAT incidents face critical, time-sensitive decisions, manually navigating extensive chemical guidelines. We investigate whether today's language models can assist responders by rapidly and reliably understanding critical information, identifying hazards, and providing recommendations. We introduce the Chemical Emergency Response Evaluation Framework (ChEmREF), a new benchmark comprising questions on 1,035 HAZMAT chemicals from the Emergency Response Guidebook and the PubChem Database. ChEmREF is organized into three tasks: (1) translation of chemical representation between structured and unstructured forms (e.g., converting C2H6O to ethanol), (2) emergency response generation (e.g., recommending appropriate evacuation distances) and (3) domain knowledge question answering from chemical safety and certification exams. Our best…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsChemical Safety and Risk Management · Risk and Safety Analysis · Machine Learning in Materials Science
