Responsible AI in Construction Safety: Systematic Evaluation of Large Language Models and Prompt Engineering
Farouq Sammour, Jia Xu, Xi Wang, Mo Hu, Zhenyu Zhang

TL;DR
This study systematically evaluates GPT-3.5 and GPT-4o's performance on safety exams, highlighting their strengths, limitations, and the impact of prompt engineering for responsible AI integration in construction safety.
Contribution
It provides a comprehensive assessment of LLMs in safety contexts, identifying performance gaps and practical prompt strategies to enhance responsible AI deployment in construction safety.
Findings
GPT-4o achieves 84.6% accuracy, GPT-3.5 reaches 73.8%.
Models excel in safety management but struggle with science and emergency topics.
Prompt engineering can improve accuracy by up to 13.5%.
Abstract
Construction remains one of the most hazardous sectors. Recent advancements in AI, particularly Large Language Models (LLMs), offer promising opportunities for enhancing workplace safety. However, responsible integration of LLMs requires systematic evaluation, as deploying them without understanding their capabilities and limitations risks generating inaccurate information, fostering misplaced confidence, and compromising worker safety. This study evaluates the performance of two widely used LLMs, GPT-3.5 and GPT-4o, across three standardized exams administered by the Board of Certified Safety Professionals (BCSP). Using 385 questions spanning seven safety knowledge areas, the study analyzes the models' accuracy, consistency, and reliability. Results show that both models consistently exceed the BCSP benchmark, with GPT-4o achieving an accuracy rate of 84.6% and GPT-3.5 reaching 73.8%.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsOccupational Health and Safety Research · Risk and Safety Analysis · BIM and Construction Integration
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · {Dispute@FaQ-s}How to file a dispute with Expedia? · Attention Is All You Need · Linear Layer · Cosine Annealing · Layer Normalization · Adam · Attention Dropout · Multi-Head Attention · Residual Connection
