Toward Safe and Responsible AI Agents: A Three-Pillar Model for Transparency, Accountability, and Trustworthiness
Edward C. Cheng, Jeshua Cheng, and Alice Siu

TL;DR
This paper proposes a Three-Pillar Model emphasizing transparency, accountability, and trustworthiness to develop safe, responsible AI agents with progressive validation and human oversight, addressing risks like bias and goal misalignment.
Contribution
It introduces a comprehensive framework combining conceptual and practical elements for trustworthy AI, including ongoing collaborative initiatives and open tooling aligned with the model.
Findings
Framework supports safe autonomous development through staged validation.
Transparency and accountability are essential for user trust and risk mitigation.
Ongoing projects promote responsible AI evolution and societal trust.
Abstract
This paper presents a conceptual and operational framework for developing and operating safe and trustworthy AI agents based on a Three-Pillar Model grounded in transparency, accountability, and trustworthiness. Building on prior work in Human-in-the-Loop systems, reinforcement learning, and collaborative AI, the framework defines an evolutionary path toward autonomous agents that balances increasing automation with appropriate human oversight. The paper argues that safe agent autonomy must be achieved through progressive validation, analogous to the staged development of autonomous driving, rather than through immediate full automation. Transparency and accountability are identified as foundational requirements for establishing user trust and for mitigating known risks in generative AI systems, including hallucinations, data bias, and goal misalignment, such as the inversion problem.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEthics and Social Impacts of AI · Human-Automation Interaction and Safety · Adversarial Robustness in Machine Learning
