ComplexFuncBench: Exploring Multi-Step and Constrained Function Calling under Long-Context Scenario
Lucen Zhong, Zhengxiao Du, Xiaohan Zhang, Haiyi Hu, Jie Tang

TL;DR
ComplexFuncBench is a new benchmark designed to evaluate large language models' ability to perform complex, multi-step, and constrained function calling in long-context scenarios, revealing current limitations and guiding future improvements.
Contribution
The paper introduces ComplexFuncBench, a comprehensive benchmark with an automatic evaluation framework for complex function calling in long-context scenarios, addressing gaps in existing assessments.
Findings
State-of-the-art LLMs show deficiencies in complex function calling.
ComplexFuncBench covers multi-step and constrained function calling tasks.
The benchmark highlights areas for future model optimization.
Abstract
Enhancing large language models (LLMs) with real-time APIs can help generate more accurate and up-to-date responses. However, evaluating the function calling abilities of LLMs in real-world scenarios remains under-explored due to the complexity of data collection and evaluation. In this work, we introduce ComplexFuncBench, a benchmark for complex function calling across five real-world scenarios. Compared to existing benchmarks, ComplexFuncBench encompasses multi-step and constrained function calling, which requires long-parameter filing, parameter value reasoning, and 128k long context. Additionally, we propose an automatic framework, ComplexEval, for quantitatively evaluating complex function calling tasks. Through comprehensive experiments, we demonstrate the deficiencies of state-of-the-art LLMs in function calling and suggest future directions for optimizing these capabilities. The…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComplex Systems and Decision Making · Religion and Sociopolitical Dynamics in Nigeria
