Passive Construction Site Safety Monitoring via Persona-Scaffolded Adversarial Chain-of-Thought VLM Verification
Ananth Sriram, Neel Mokaria, Rajveer Singh

TL;DR
This paper introduces a passive construction safety monitoring system using multi-stage AI models and persona-scaffolded prompts to improve violation detection accuracy and reduce hallucinations.
Contribution
It presents a novel three-stage architecture with a unique prompt design that enhances compliance verification and hallucination control in construction safety monitoring.
Findings
12% precision improvement with persona-scaffolded prompts
Largest gains on hallucination-prone violation categories
System maps violations to OSHA standards and scores ergonomic risks
Abstract
Construction remains the deadliest industry sector in the United States, with 1,055 fatal worker injuries recorded in 2023, and the majority preventable. Existing monitoring approaches are expensive, require real-time human operators, or address only a narrow subset of violations. This paper presents a passive, end-of-shift construction safety monitoring pipeline processing video from POV body-worn and fixed wall-mounted cameras through a three-stage architecture: (1) fine-tuned YOLO11 for primary PPE and hazard detection, (2) SAM 3 for segmentation refinement and worker deduplication, and (3) Qwen3-VL-8B-Instruct with a method-prompted, persona-scaffolded three-pass adversarial chain-of-thought protocol for compliance verification and hallucination control. The principal contribution is the Stage 3 prompt design: professional persona backstories following the method-actor framing drive…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
