Toward Safe and Responsible AI Agents: A Three-Pillar Model for Transparency, Accountability, and Trustworthiness

Edward C. Cheng; Jeshua Cheng; and Alice Siu

arXiv:2601.06223·cs.CY·January 13, 2026

Toward Safe and Responsible AI Agents: A Three-Pillar Model for Transparency, Accountability, and Trustworthiness

Edward C. Cheng, Jeshua Cheng, and Alice Siu

PDF

Open Access

TL;DR

This paper proposes a Three-Pillar Model emphasizing transparency, accountability, and trustworthiness to develop safe, responsible AI agents with progressive validation and human oversight, addressing risks like bias and goal misalignment.

Contribution

It introduces a comprehensive framework combining conceptual and practical elements for trustworthy AI, including ongoing collaborative initiatives and open tooling aligned with the model.

Findings

01

Framework supports safe autonomous development through staged validation.

02

Transparency and accountability are essential for user trust and risk mitigation.

03

Ongoing projects promote responsible AI evolution and societal trust.

Abstract

This paper presents a conceptual and operational framework for developing and operating safe and trustworthy AI agents based on a Three-Pillar Model grounded in transparency, accountability, and trustworthiness. Building on prior work in Human-in-the-Loop systems, reinforcement learning, and collaborative AI, the framework defines an evolutionary path toward autonomous agents that balances increasing automation with appropriate human oversight. The paper argues that safe agent autonomy must be achieved through progressive validation, analogous to the staged development of autonomous driving, rather than through immediate full automation. Transparency and accountability are identified as foundational requirements for establishing user trust and for mitigating known risks in generative AI systems, including hallucinations, data bias, and goal misalignment, such as the inversion problem.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsEthics and Social Impacts of AI · Human-Automation Interaction and Safety · Adversarial Robustness in Machine Learning