Responsible AI in Construction Safety: Systematic Evaluation of Large   Language Models and Prompt Engineering

Farouq Sammour; Jia Xu; Xi Wang; Mo Hu; Zhenyu Zhang

arXiv:2411.08320·cs.AI·November 14, 2024·2 cites

Responsible AI in Construction Safety: Systematic Evaluation of Large Language Models and Prompt Engineering

Farouq Sammour, Jia Xu, Xi Wang, Mo Hu, Zhenyu Zhang

PDF

Open Access

TL;DR

This study systematically evaluates GPT-3.5 and GPT-4o's performance on safety exams, highlighting their strengths, limitations, and the impact of prompt engineering for responsible AI integration in construction safety.

Contribution

It provides a comprehensive assessment of LLMs in safety contexts, identifying performance gaps and practical prompt strategies to enhance responsible AI deployment in construction safety.

Findings

01

GPT-4o achieves 84.6% accuracy, GPT-3.5 reaches 73.8%.

02

Models excel in safety management but struggle with science and emergency topics.

03

Prompt engineering can improve accuracy by up to 13.5%.

Abstract

Construction remains one of the most hazardous sectors. Recent advancements in AI, particularly Large Language Models (LLMs), offer promising opportunities for enhancing workplace safety. However, responsible integration of LLMs requires systematic evaluation, as deploying them without understanding their capabilities and limitations risks generating inaccurate information, fostering misplaced confidence, and compromising worker safety. This study evaluates the performance of two widely used LLMs, GPT-3.5 and GPT-4o, across three standardized exams administered by the Board of Certified Safety Professionals (BCSP). Using 385 questions spanning seven safety knowledge areas, the study analyzes the models' accuracy, consistency, and reliability. Results show that both models consistently exceed the BCSP benchmark, with GPT-4o achieving an accuracy rate of 84.6% and GPT-3.5 reaching 73.8%.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsOccupational Health and Safety Research · Risk and Safety Analysis · BIM and Construction Integration

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · {Dispute@FaQ-s}How to file a dispute with Expedia? · Attention Is All You Need · Linear Layer · Cosine Annealing · Layer Normalization · Adam · Attention Dropout · Multi-Head Attention · Residual Connection