Building Safer Sites: A Large-Scale Multi-Level Dataset for Construction Safety Research
Zhenhui Ou, Dawei Li, Zhen Tan, Wenlin Li, Huan Liu, Siyuan Song

TL;DR
This paper introduces a comprehensive multi-level construction safety dataset (CSDataset) sourced from OSHA, enabling advanced analysis and machine learning applications to improve safety outcomes in civil engineering.
Contribution
The paper presents a novel, large-scale, multi-level construction safety dataset integrating structured and unstructured data, facilitating diverse research and machine learning approaches.
Findings
Complaint-driven inspections linked to 17.3% fewer incidents.
Benchmarking of preliminary approaches demonstrates dataset utility.
Cross-level analysis offers new safety insights.
Abstract
Construction safety research is a critical field in civil engineering, aiming to mitigate risks and prevent injuries through the analysis of site conditions and human factors. However, the limited volume and lack of diversity in existing construction safety datasets pose significant challenges to conducting in-depth analyses. To address this research gap, this paper introduces the Construction Safety Dataset (CSDataset), a well-organized comprehensive multi-level dataset that encompasses incidents, inspections, and violations recorded sourced from the Occupational Safety and Health Administration (OSHA). This dataset uniquely integrates structured attributes with unstructured narratives, facilitating a wide range of approaches driven by machine learning and large language models. We also conduct a preliminary approach benchmarking and various cross-level analyses using our dataset,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
