LLM-based HSE Compliance Assessment: Benchmark, Performance, and Advancements

Jianwei Wang; Mengqi Wang; Yinsi Zhou; Zhenchang Xing; Qing Liu; Xiwei Xu; Wenjie Zhang; and Liming Zhu

arXiv:2505.22959·cs.CL·May 30, 2025

LLM-based HSE Compliance Assessment: Benchmark, Performance, and Advancements

Jianwei Wang, Mengqi Wang, Yinsi Zhou, Zhenchang Xing, Qing Liu, Xiwei Xu, Wenjie Zhang, and Liming Zhu

PDF

Open Access 1 Repo

TL;DR

This paper introduces HSE-Bench, a benchmark dataset for evaluating LLMs in HSE compliance assessment, revealing current models' reliance on semantic matching and proposing a new prompting method, RoE, to improve reasoning accuracy.

Contribution

The paper presents the first benchmark dataset for LLM-based HSE compliance assessment and introduces RoE, a prompting technique that enhances reasoning capabilities of LLMs in this domain.

Findings

01

LLMs perform well but mainly rely on semantic matching.

02

Current LLM reasoning traces lack systematic legal reasoning.

03

RoE improves LLM decision accuracy by simulating expert reasoning.

Abstract

Health, Safety, and Environment (HSE) compliance assessment demands dynamic real-time decision-making under complicated regulations and complex human-machine-environment interactions. While large language models (LLMs) hold significant potential for decision intelligence and contextual dialogue, their capacity for domain-specific knowledge in HSE and structured legal reasoning remains underexplored. We introduce HSE-Bench, the first benchmark dataset designed to evaluate the HSE compliance assessment capabilities of LLM. HSE-Bench comprises over 1,000 manually curated questions drawn from regulations, court cases, safety exams, and fieldwork videos, and integrates a reasoning flow based on Issue spotting, rule Recall, rule Application, and rule Conclusion (IRAC) to assess the holistic reasoning pipeline. We conduct extensive evaluations on different prompting strategies and more than 10…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

mengqiwang1/hse-bench
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMulti-Agent Systems and Negotiation · Topic Modeling · Multimodal Machine Learning Applications