Benchmarking Floworks against OpenAI & Anthropic: A Novel Framework for   Enhanced LLM Function Calling

Nirav Bhan; Shival Gupta; Sai Manaswini; Ritik Baba; Narun Yadav,; Hillori Desai; Yash Choudhary; Aman Pawar; Sarthak Shrivastava; Sudipta; Biswas

arXiv:2410.17950·cs.AI·October 24, 2024

Benchmarking Floworks against OpenAI & Anthropic: A Novel Framework for Enhanced LLM Function Calling

Nirav Bhan, Shival Gupta, Sai Manaswini, Ritik Baba, Narun Yadav,, Hillori Desai, Yash Choudhary, Aman Pawar, Sarthak Shrivastava, Sudipta, Biswas

PDF

Open Access

TL;DR

This paper presents ThorV2, a new architecture that improves LLMs' function calling, outperforming leading models in accuracy, reliability, latency, and cost on a CRM benchmark, with better scalability for multi-step tasks.

Contribution

Introduction of ThorV2, a novel architecture that significantly enhances LLMs' function calling abilities and scalability, evaluated through a comprehensive CRM benchmark.

Findings

01

ThorV2 outperforms OpenAI and Anthropic models in accuracy and reliability.

02

ThorV2 demonstrates lower latency and cost efficiency.

03

ThorV2 scales better to multi-step tasks than traditional models.

Abstract

Large Language Models (LLMs) have shown remarkable capabilities in various domains, yet their economic impact has been limited by challenges in tool use and function calling. This paper introduces ThorV2, a novel architecture that significantly enhances LLMs' function calling abilities. We develop a comprehensive benchmark focused on HubSpot CRM operations to evaluate ThorV2 against leading models from OpenAI and Anthropic. Our results demonstrate that ThorV2 outperforms existing models in accuracy, reliability, latency, and cost efficiency for both single and multi-API calling tasks. We also show that ThorV2 is far more reliable and scales better to multistep tasks compared to traditional models. Our work offers the tantalizing possibility of more accurate function-calling compared to today's best-performing models using significantly smaller LLMs. These advancements have significant…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Neural Networks and Applications · Image Processing and 3D Reconstruction