Beyond Text: Unveiling Multimodal Proficiency of Large Language Models   with MultiAPI Benchmark

Xiao Liu; Jianfeng Lin; Jiawei Zhang

arXiv:2311.13053·cs.CL·November 23, 2023·1 cites

Beyond Text: Unveiling Multimodal Proficiency of Large Language Models with MultiAPI Benchmark

Xiao Liu, Jianfeng Lin, Jiawei Zhang

PDF

Open Access 1 Repo

TL;DR

This paper introduces MultiAPI, a large-scale benchmark dataset for evaluating large language models' multimodal capabilities, revealing strengths in API decision-making but challenges in domain understanding and argument generation.

Contribution

The study presents MultiAPI, a novel comprehensive benchmark dataset designed to evaluate and analyze LLMs' proficiency in multimodal and tool-augmented tasks.

Findings

01

LLMs are proficient in API call decision-making.

02

Challenges remain in domain identification and argument generation.

03

Auxiliary context can impair LLM performance.

Abstract

The proliferation of Large Language Models like ChatGPT has significantly advanced language understanding and generation, impacting a broad spectrum of applications. However, these models predominantly excel in text-based tasks, overlooking the complexity of real-world multimodal information. This study introduces MultiAPI, a pioneering comprehensive large-scale API benchmark dataset aimed at expanding LLMs' proficiency in multimodal contexts. Developed collaboratively through ChatGPT, MultiAPI consists of 235 diverse API calls and 2,038 contextual prompts, offering a unique platform evaluation of tool-augmented LLMs handling multimodal tasks. Through comprehensive experiments, our findings reveal that while LLMs demonstrate proficiency in API call decision-making, they face challenges in domain identification, function selection, and argument generation. What's more, we surprisingly…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

haroldliuj/multiapi
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Text Readability and Simplification · Natural Language Processing Techniques