Aegis: Towards Governance, Integrity, and Security of AI Voice Agents

Xiang Li; Pin-Yu Chen; Wenqi Wei

arXiv:2602.07379·cs.CR·February 10, 2026

Aegis: Towards Governance, Integrity, and Security of AI Voice Agents

Xiang Li, Pin-Yu Chen, Wenqi Wei

PDF

Open Access 3 Reviews

TL;DR

This paper introduces Aegis, a comprehensive red-teaming framework to evaluate and improve the security, governance, and integrity of AI voice agents, revealing vulnerabilities and guiding layered defense strategies.

Contribution

Aegis is the first systematic framework modeling realistic deployment scenarios to identify critical security risks in AI voice agents, including behavioral vulnerabilities.

Findings

01

Voice agents are vulnerable to behavioral attacks despite access controls.

02

Open-weight models show higher susceptibility to attacks.

03

Layered defenses are necessary for securing voice agents.

Abstract

With the rapid advancement and adoption of Audio Large Language Models (ALLMs), voice agents are now being deployed in high-stakes domains such as banking, customer service, and IT support. However, their vulnerabilities to adversarial misuse still remain unexplored. While prior work has examined aspects of trustworthiness in ALLMs, such as harmful content generation and hallucination, systematic security evaluations of voice agents are still lacking. To address this gap, we propose Aegis, a red-teaming framework for the governance, integrity, and security of voice agents. Aegis models the realistic deployment pipeline of voice agents and designs structured adversarial scenarios of critical risks, including privacy leakage, privilege escalation, resource abuse, etc. We evaluate the framework through case studies in banking call centers, IT Support, and logistics. Our evaluation shows…

Peer Reviews

Decision·Submitted to ICLR 2026

Reviewer 01Rating 4Confidence 3

Strengths

- The Aegis framework goes beyond traditional model-level robustness evaluations and offers a realistic assessment of deployed systems in diverse, high-risk domains. - By considering a broad set of adversarial scenarios, the framework offers valuable insights into various vulnerabilities and highlights real-world risks that existing models fail to address.

Weaknesses

- Red-teaming framework of ALLM has been studied before, although the authors claim this work focuses on more realistic assessment. Therefore, the contribution of this paper seems unclear. - The framework heavily relies on certain attack scenarios, such as authentication bypass and resource abuse, but the paper could benefit from exploring additional advanced adversarial tactics. For instance, attacks exploiting AI’s cognitive biases in interpreting complex dialogues could be a future avenue f

Reviewer 02Rating 2Confidence 4

Strengths

(1) The paper is clearly structured and well-organized. (2) The manuscript is free of grammatical and typographical errors.

Weaknesses

(1) **Limited evaluation of practical voice agents** Although the paper evaluates the governance, integrity, and security of AI voice agents, it primarily focuses on backbone models rather than complete, deployed agent systems. Real-world AI voice agents typically include multiple components, such as data processing, safeguard, and storage modules, in addition to the backbone model. Therefore, restricting the evaluation to backbone models does not provide a comprehensive understanding of the se

Reviewer 03Rating 6Confidence 4

Strengths

1. The paper focuses on a highly important application: adversarial attacks of voice agents. In particular, they focus on 3 customer service applications (banking, IT, and logistics) where LLMs are already deployed. 2. The evaluation framework is rigorous and reproducible. The 5 attack objectives x 5 attacker personas are relevant for many domains, as the others show in their case studies. 3. The evaluation results are novel and interesting, showing that some attack vectors remain challengin

Weaknesses

1. Some details are confusing in the evaluation setup. In section 3.3 and Figure 2, the language around "attacker", "attack agent", "agent", and "evaluator" could be made clear. It appears these are all LLMs, the "attack agent" is always GPT-4o, the "backbone agent" is one of 7 models, and the "evaluator" is always GPT-4o? Using consistent language like "attack agent" and "backbone agent" might be helpful. 2. While the paper conducts a thorough evaluation of 7 "backbone agents", I would have

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · AI in Service Interactions · Topic Modeling