Sherlock: Reliable and Efficient Agentic Workflow Execution
Yeonju Ro, Haoran Qiu, \'I\~nigo Goiri, Rodrigo Fonseca, Ricardo Bianchini, Aditya Akella, Zhangyang Wang, Mattan Erez, Esha Choukse

TL;DR
Sherlock is a method that improves the reliability and efficiency of large language model workflows by selectively verifying error-prone steps using counterfactual analysis, reducing latency and costs while boosting accuracy.
Contribution
It introduces a fault-aware, selective verification approach using counterfactual analysis to optimize verification placement and minimize overhead in agentic workflows.
Findings
Achieves 18.3% accuracy improvement over baseline.
Reduces execution time by up to 48.7%.
Lowers verification costs by 26.0%.
Abstract
With the increasing adoption of large language models (LLM), agentic workflows, which compose multiple LLM calls with tools, retrieval, and reasoning steps, are increasingly replacing traditional applications. However, such workflows are inherently error-prone: incorrect or partially correct output at one step can propagate or even amplify through subsequent stages, compounding the impact on the final output. Recent work proposes integrating verifiers that validate LLM output or actions, such as self-reflection, debate, or LLM-as-a-judge mechanisms. Yet, verifying every step introduces significant latency and cost overheads. In this work, we seek to answer three key questions: which nodes in a workflow are most error-prone and thus deserve costly verification, how to select the most appropriate verifier for each node, and how to use verification with minimal impact to latency? Our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsScientific Computing and Data Management · Machine Learning in Materials Science · Natural Language Processing Techniques
