BabelBench: An Omni Benchmark for Code-Driven Analysis of Multimodal and Multistructured Data
Xuwu Wang, Qiwen Cui, Yunzhe Tao, Yiran Wang, Ziwei Chai, Xiaotian, Han, Boyi Liu, Jianbo Yuan, Jing Su, Guoyin Wang, Tingkai Liu, Liyu Chen,, Tianyi Liu, Tao Sun, Yufeng Zhang, Sirui Zheng, Quanzeng You, Yang Yang,, Hongxia Yang

TL;DR
BabelBench is a comprehensive benchmark framework designed to evaluate large language models' abilities in handling multimodal, multistructured data with code execution, revealing significant room for improvement even in state-of-the-art models.
Contribution
This paper introduces BabelBench, a novel unified benchmark with 247 problems for assessing LLMs on multimodal, multistructured data processing and reasoning tasks.
Findings
ChatGPT-4 shows substantial room for improvement.
BabelBench covers perception, reasoning, and debugging tasks.
Provides guidance for future research in multimodal data handling.
Abstract
Large language models (LLMs) have become increasingly pivotal across various domains, especially in handling complex data types. This includes structured data processing, as exemplified by ChartQA and ChatGPT-Ada, and multimodal unstructured data processing as seen in Visual Question Answering (VQA). These areas have attracted significant attention from both industry and academia. Despite this, there remains a lack of unified evaluation methodologies for these diverse data handling scenarios. In response, we introduce BabelBench, an innovative benchmark framework that evaluates the proficiency of LLMs in managing multimodal multistructured data with code execution. BabelBench incorporates a dataset comprising 247 meticulously curated problems that challenge the models with tasks in perception, commonsense reasoning, logical reasoning, and so on. Besides the basic capabilities of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques
MethodsSoftmax · Attention Is All You Need
