Helpful Agent Meets Deceptive Judge: Understanding Vulnerabilities in Agentic Workflows
Yifei Ming, Zixuan Ke, Xuan-Phi Nguyen, Jiayu Wang, and Shafiq Joty

TL;DR
This paper systematically analyzes vulnerabilities in agentic workflows where models critique each other, revealing that even strong agents can be misled by flawed feedback, and introduces benchmarks to evaluate robustness.
Contribution
It develops a taxonomy for judge behavior, creates WAFER-QA benchmark, and uncovers fundamental vulnerabilities in feedback-driven agentic workflows.
Findings
Agents are vulnerable to persuasive flawed critiques.
Misleading feedback can cause correct answers to be overturned.
Behavioral patterns differ between reasoning and non-reasoning models.
Abstract
Agentic workflows -- where multiple large language model (LLM) instances interact to solve tasks -- are increasingly built on feedback mechanisms, where one model evaluates and critiques another. Despite the promise of feedback-driven improvement, the stability of agentic workflows rests on the reliability of the judge. However, judges may hallucinate information, exhibit bias, or act adversarially -- introducing critical vulnerabilities into the workflow. In this work, we present a systematic analysis of agentic workflows under deceptive or misleading feedback. We introduce a two-dimensional framework for analyzing judge behavior, along axes of intent (from constructive to malicious) and knowledge (from parametric-only to retrieval-augmented systems). Using this taxonomy, we construct a suite of judge behaviors and develop WAFER-QA, a new benchmark with critiques grounded in retrieved…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Scientific Computing and Data Management · Explainable Artificial Intelligence (XAI)
