LongFuncEval: Measuring the effectiveness of long context models for function calling
Kiran Kate, Tejaswini Pedapati, Kinjal Basu, Yara Rizk, Vijil Chenthamarakshan, Subhajit Chaudhury, Mayank Agarwal, Ibrahim Abdelaziz

TL;DR
This paper evaluates how well large language models perform in calling external tools within long context conversations, revealing significant performance drops as context length and complexity increase, and highlighting the need for further improvements.
Contribution
It is the first comprehensive study of long context understanding in LLMs specifically for tool calling, including new benchmarks and analysis of performance degradation.
Findings
Performance drops up to 85% with more tools
Answer retrieval degrades up to 91% with longer responses
Multi-turn conversation accuracy decreases by up to 40%
Abstract
Multiple recent studies have documented large language models' (LLMs) performance on calling external tools/functions. Others focused on LLMs' abilities to handle longer context lengths. At the intersection of these areas lies another interesting problem: LLMs' abilities to accurately perform function calls in long context settings. Particularly, when calling tools, LLMs are encumbered by three predominant challenges: (1) a large catalog of tools, (2) long responses from the tool APIs, and (3) long multi-turn conversations. These challenges are particularly relevant to enterprise applications of LLMs which engage in multi-turn conversations with users to complete complex tasks that require a large catalog of complex tools. The literature contains multiple investigations of long context challenges such as lost in the middle or needle in the haystack for natural language tasks. In this…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Personal Information Management and User Behavior
