Structural Quality Gaps in Practitioner AI Governance Prompts: An Empirical Study Using a Five-Principle Evaluation Framework
Christo Zietsman

TL;DR
This study introduces a five-principle framework for evaluating the structural completeness of AI governance prompts, revealing significant gaps in existing practitioner-authored prompts and suggesting automated analysis solutions.
Contribution
It presents a novel, theory-grounded evaluation framework for governance prompts and applies it to real-world GitHub files, uncovering common structural deficiencies.
Findings
37% of prompts scored below the structural completeness threshold
Data classification and assessment rubrics are often missing
Automated static analysis could improve prompt quality
Abstract
AI governance programmes increasingly rely on natural language prompts to constrain and direct AI agent behaviour. These prompts function as executable specifications: they define the agent's mandate, scope, and quality criteria. Despite this role, no systematic framework exists for evaluating whether a governance prompt is structurally complete. We introduce a five-principle evaluation framework grounded in computability theory, proof theory, and Bayesian epistemology, and apply it to an empirical corpus of 34 publicly available AGENTS.md governance files sourced from GitHub. Our evaluation reveals that 37% of evaluated file-model pairs score below the structural completeness threshold, with data classification and assessment rubric criteria most frequently absent. These results suggest that practitioner-authored governance prompts exhibit consistent structural patterns that automated…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
