TrustAgent: Towards Safe and Trustworthy LLM-based Agents

Wenyue Hua; Xianjun Yang; Mingyu Jin; Zelong Li; Wei Cheng; Ruixiang; Tang; Yongfeng Zhang

arXiv:2402.01586·cs.CL·October 7, 2024·2 cites

TrustAgent: Towards Safe and Trustworthy LLM-based Agents

Wenyue Hua, Xianjun Yang, Mingyu Jin, Zelong Li, Wei Cheng, Ruixiang, Tang, Yongfeng Zhang

PDF

Open Access 1 Repo

TL;DR

TrustAgent is a framework that enhances the safety and trustworthiness of LLM-based agents in high-stake environments by integrating safety strategies at multiple stages of plan generation.

Contribution

The paper introduces an Agent-Constitution-based framework with pre-, in-, and post-planning strategies to improve safety and helpfulness of LLM agents.

Findings

01

Effective safety enhancement across multiple domains.

02

Improved agent helpfulness through safety measures.

03

Highlighting the role of LLM reasoning in safety adherence.

Abstract

The rise of LLM-based agents shows great potential to revolutionize task planning, capturing significant attention. Given that these agents will be integrated into high-stake domains, ensuring their reliability and safety is crucial. This paper presents an Agent-Constitution-based agent framework, TrustAgent, with a particular focus on improving the LLM-based agent safety. The proposed framework ensures strict adherence to the Agent Constitution through three strategic components: pre-planning strategy which injects safety knowledge to the model before plan generation, in-planning strategy which enhances safety during plan generation, and post-planning strategy which ensures safety by post-planning inspection. Our experimental results demonstrate that the proposed framework can effectively enhance an LLM agent's safety across multiple domains by identifying and mitigating potential…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

agiresearch/trustagent
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAccess Control and Trust · Cloud Data Security Solutions

MethodsFocus