BankerToolBench: Evaluating AI Agents in End-to-End Investment Banking Workflows
Elaine Lau, Markus D\"ucker, Ronak Chaudhary, Hui Wen Goh, Rosemary Wei, Vaibhav Kumar, Saed Qunbar, Guram Gogia, Yi Liu, Scott Millslagle, Nasim Borazjanizadeh, Ulyana Tkachenko, Samuel Eshun Danquah, Collin Schweiker, Vijay Karumathil, Asrith Devalaraju, Varsha Sandadi

TL;DR
BankerToolBench (BTB) is a new open-source benchmark designed to evaluate AI agents on complex, real-world investment banking tasks, revealing current AI limitations in professional workflows.
Contribution
This work introduces BTB, a realistic, comprehensive benchmark for assessing AI performance in end-to-end investment banking tasks, grounded in industry collaboration.
Findings
Even the best model (GPT-5.4) fails nearly half of rubric criteria.
Banker-rated outputs are 0% client-ready for top models.
Key obstacles include cross-artifact consistency issues.
Abstract
Existing AI benchmarks lack the fidelity to assess economically meaningful progress on professional workflows. To evaluate frontier AI agents in a high-value, labor-intensive profession, we introduce BankerToolBench (BTB): an open-source benchmark of end-to-end analytical workflows routinely performed by junior investment bankers. To develop an ecologically valid benchmark grounded in representative work environments, we collaborated with 502 investment bankers from leading firms. BTB requires agents to execute senior banker requests by navigating data rooms, using industry tools (market data platform, SEC filings database), and generating multi-file deliverables--including Excel financial models, PowerPoint pitch decks, and PDF/Word reports. Completing a BTB task takes bankers up to 21 hours, underscoring the economic stakes of successfully delegating this work to AI. BTB enables…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
