Assertion-Conditioned Compliance: A Provenance-Aware Vulnerability in Multi-Turn Tool-Calling Agents
Daud Waqas, Aaryamaan Golthi, Erika Hayashida, Huanzhi Mao

TL;DR
This paper introduces Assertion-Conditioned Compliance (A-CC), a new evaluation method for multi-turn tool-calling language models, revealing their vulnerability to misleading assertions and policy conflicts in safety-critical applications.
Contribution
The paper proposes A-CC as a novel holistic evaluation paradigm for multi-turn function-calling models, highlighting their susceptibility to assertion-based vulnerabilities.
Findings
Models are vulnerable to user-sourced assertion sycophancy.
Models exhibit compliance issues with contradictory system policies.
A-CC reveals latent vulnerabilities in deployed agents.
Abstract
Multi-turn tool-calling LLMs (models capable of invoking external APIs or tools across several user turns) have emerged as a key feature in modern AI assistants, enabling extended dialogues from benign tasks to critical business, medical, and financial operations. Yet implementing multi-turn pipelines remains difficult for many safety-critical industries due to ongoing concerns regarding model resilience. While standardized benchmarks such as the Berkeley Function-Calling Leaderboard (BFCL) have underpinned confidence concerning advanced function-calling models (like Salesforce's xLAM V2), there is still a lack of visibility into multi-turn conversation-level robustness, especially given their exposure to real-world systems. In this paper, we introduce Assertion-Conditioned Compliance (A-CC), a novel evaluation paradigm for multi-turn function-calling dialogues. A-CC provides holistic…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsEthics and Social Impacts of AI · Adversarial Robustness in Machine Learning · Advanced Malware Detection Techniques
