OmniEvalKit: A Modular, Lightweight Toolbox for Evaluating Large Language Model and its Omni-Extensions
Yi-Kai Zhang, Xu-Xiang Zhong, Shiyin Lu, Qing-Guo Chen and, De-Chuan Zhan, Han-Jia Ye

TL;DR
OmniEvalKit is a modular, lightweight benchmarking toolbox that enables comprehensive evaluation of large language models and their omni-extensions across multilingual, multidomain, and multimodal tasks, supporting over 100 models and 50 datasets.
Contribution
It introduces a novel, flexible, and fast-deployable evaluation framework that integrates diverse models and datasets seamlessly, enhancing LLM assessment capabilities.
Findings
Supports over 100 LLMs and 50 datasets
Enables evaluation across multiple capabilities simultaneously
Provides a lightweight, automated evaluation system
Abstract
The rapid advancements in Large Language Models (LLMs) have significantly expanded their applications, ranging from multilingual support to domain-specific tasks and multimodal integration. In this paper, we present OmniEvalKit, a novel benchmarking toolbox designed to evaluate LLMs and their omni-extensions across multilingual, multidomain, and multimodal capabilities. Unlike existing benchmarks that often focus on a single aspect, OmniEvalKit provides a modular, lightweight, and automated evaluation system. It is structured with a modular architecture comprising a Static Builder and Dynamic Data Flow, promoting the seamless integration of new models and datasets. OmniEvalKit supports over 100 LLMs and 50 evaluation datasets, covering comprehensive evaluations across thousands of model-dataset combinations. OmniEvalKit is dedicated to creating an ultra-lightweight and fast-deployable…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques
MethodsFocus
