ScaleCall -- Agentic Tool Calling at Scale for Fintech: Challenges, Methods, and Deployment Insights
Richard Osuagwu, Thomas Cook, Maraim Masoud, Koustav Ghosal, Riccardo Mattivi

TL;DR
This paper presents ScaleCall, a framework for tool calling in fintech enterprise environments, analyzing retrieval methods and deployment challenges to optimize accuracy and efficiency within regulatory constraints.
Contribution
It introduces ScaleCall, a novel tool-calling framework tailored for regulated enterprise settings, and provides a comprehensive evaluation of retrieval methods and deployment insights.
Findings
Embedding-based retrieval offers low latency for large tool sets.
Listwise ranking improves disambiguation of overlapping tools.
Hybrid approaches balance accuracy and efficiency in deployment.
Abstract
While Large Language Models (LLMs) excel at tool calling, deploying these capabilities in regulated enterprise environments such as fintech presents unique challenges due to on-premises constraints, regulatory compliance requirements, and the need to disambiguate large, functionally overlapping toolsets. In this paper, we present a comprehensive study of tool retrieval methods for enterprise environments through the development and deployment of ScaleCall, a prototype tool-calling framework within Mastercard designed for orchestrating internal APIs and automating data engineering workflows. We systematically evaluate embedding-based retrieval, prompt-based listwise ranking, and hybrid approaches, revealing that method effectiveness depends heavily on domain-specific factors rather than inherent algorithmic superiority. Through empirical investigation on enterprise-derived benchmarks, we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Mobile Crowdsensing and Crowdsourcing · Spreadsheets and End-User Computing
