Helpful Agent Meets Deceptive Judge: Understanding Vulnerabilities in Agentic Workflows

Yifei Ming; Zixuan Ke; Xuan-Phi Nguyen; Jiayu Wang; and Shafiq Joty

arXiv:2506.03332·cs.AI·June 5, 2025

Helpful Agent Meets Deceptive Judge: Understanding Vulnerabilities in Agentic Workflows

Yifei Ming, Zixuan Ke, Xuan-Phi Nguyen, Jiayu Wang, and Shafiq Joty

PDF

Open Access 1 Datasets

TL;DR

This paper systematically analyzes vulnerabilities in agentic workflows where models critique each other, revealing that even strong agents can be misled by flawed feedback, and introduces benchmarks to evaluate robustness.

Contribution

It develops a taxonomy for judge behavior, creates WAFER-QA benchmark, and uncovers fundamental vulnerabilities in feedback-driven agentic workflows.

Findings

01

Agents are vulnerable to persuasive flawed critiques.

02

Misleading feedback can cause correct answers to be overturned.

03

Behavioral patterns differ between reasoning and non-reasoning models.

Abstract

Agentic workflows -- where multiple large language model (LLM) instances interact to solve tasks -- are increasingly built on feedback mechanisms, where one model evaluates and critiques another. Despite the promise of feedback-driven improvement, the stability of agentic workflows rests on the reliability of the judge. However, judges may hallucinate information, exhibit bias, or act adversarially -- introducing critical vulnerabilities into the workflow. In this work, we present a systematic analysis of agentic workflows under deceptive or misleading feedback. We introduce a two-dimensional framework for analyzing judge behavior, along axes of intent (from constructive to malicious) and knowledge (from parametric-only to retrieval-augmented systems). Using this taxonomy, we construct a suite of judge behaviors and develop WAFER-QA, a new benchmark with critiques grounded in retrieved…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

Salesforce/WAFER-QA
dataset· 8 dl
8 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Scientific Computing and Data Management · Explainable Artificial Intelligence (XAI)