Passive Construction Site Safety Monitoring via Persona-Scaffolded Adversarial Chain-of-Thought VLM Verification

Ananth Sriram; Neel Mokaria; Rajveer Singh

arXiv:2605.19869·cs.CV·May 20, 2026

Passive Construction Site Safety Monitoring via Persona-Scaffolded Adversarial Chain-of-Thought VLM Verification

Ananth Sriram, Neel Mokaria, Rajveer Singh

PDF

TL;DR

This paper introduces a passive construction safety monitoring system using multi-stage AI models and persona-scaffolded prompts to improve violation detection accuracy and reduce hallucinations.

Contribution

It presents a novel three-stage architecture with a unique prompt design that enhances compliance verification and hallucination control in construction safety monitoring.

Findings

01

12% precision improvement with persona-scaffolded prompts

02

Largest gains on hallucination-prone violation categories

03

System maps violations to OSHA standards and scores ergonomic risks

Abstract

Construction remains the deadliest industry sector in the United States, with 1,055 fatal worker injuries recorded in 2023, and the majority preventable. Existing monitoring approaches are expensive, require real-time human operators, or address only a narrow subset of violations. This paper presents a passive, end-of-shift construction safety monitoring pipeline processing video from POV body-worn and fixed wall-mounted cameras through a three-stage architecture: (1) fine-tuned YOLO11 for primary PPE and hazard detection, (2) SAM 3 for segmentation refinement and worker deduplication, and (3) Qwen3-VL-8B-Instruct with a method-prompted, persona-scaffolded three-pass adversarial chain-of-thought protocol for compliance verification and hallucination control. The principal contribution is the Stage 3 prompt design: professional persona backstories following the method-actor framing drive…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.