Enhancing LLM Agent Safety via Causal Influence Prompting

Dongyoon Hahm; Woogyeol Jin; June Suk Choi; Sungsoo Ahn; Kimin Lee

arXiv:2507.00979·cs.AI·July 2, 2025

Enhancing LLM Agent Safety via Causal Influence Prompting

Dongyoon Hahm, Woogyeol Jin, June Suk Choi, Sungsoo Ahn, Kimin Lee

PDF

Open Access 1 Repo

TL;DR

This paper presents CIP, a novel method using causal influence diagrams to improve the safety of LLM-based autonomous agents by anticipating and mitigating harmful outcomes.

Contribution

Introducing CIP, a new approach that employs causal influence diagrams to enhance safety in LLM agents through structured decision-making and iterative refinement.

Findings

01

Effective safety improvements in code execution tasks

02

Enhanced safety in mobile device control tasks

03

Causal influence diagrams enable better risk mitigation

Abstract

As autonomous agents powered by large language models (LLMs) continue to demonstrate potential across various assistive tasks, ensuring their safe and reliable behavior is crucial for preventing unintended consequences. In this work, we introduce CIP, a novel technique that leverages causal influence diagrams (CIDs) to identify and mitigate risks arising from agent decision-making. CIDs provide a structured representation of cause-and-effect relationships, enabling agents to anticipate harmful outcomes and make safer decisions. Our approach consists of three key steps: (1) initializing a CID based on task specifications to outline the decision-making process, (2) guiding agent interactions with the environment using the CID, and (3) iteratively refining the CID based on observed behaviors and outcomes. Experimental results demonstrate that our method effectively enhances safety in both…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

hahmdy/causal_influence_prompting
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsExplainable Artificial Intelligence (XAI) · Multimodal Machine Learning Applications · Topic Modeling