ViLLM-Eval: A Comprehensive Evaluation Suite for Vietnamese Large   Language Models

Trong-Hieu Nguyen; Anh-Cuong Le; Viet-Cuong Nguyen

arXiv:2404.11086·cs.CL·April 19, 2024·1 cites

ViLLM-Eval: A Comprehensive Evaluation Suite for Vietnamese Large Language Models

Trong-Hieu Nguyen, Anh-Cuong Le, Viet-Cuong Nguyen

PDF

Open Access 1 Datasets

TL;DR

ViLLM-Eval is a new comprehensive benchmark designed to evaluate Vietnamese large language models across various tasks and disciplines, revealing significant room for improvement in their Vietnamese language understanding.

Contribution

This work introduces ViLLM-Eval, the first extensive evaluation suite for Vietnamese LLMs, covering multiple tasks and difficulty levels to assess their capabilities.

Findings

01

Advanced models still have significant gaps in Vietnamese understanding

02

ViLLM-Eval effectively identifies strengths and weaknesses of Vietnamese LLMs

03

Benchmark promotes development of better Vietnamese language models

Abstract

The rapid advancement of large language models (LLMs) necessitates the development of new benchmarks to accurately assess their capabilities. To address this need for Vietnamese, this work aims to introduce ViLLM-Eval, the comprehensive evaluation suite designed to measure the advanced knowledge and reasoning abilities of foundation models within a Vietnamese context. ViLLM-Eval consists of multiple-choice questions and predict next word tasks spanning various difficulty levels and diverse disciplines, ranging from humanities to science and engineering. A thorough evaluation of the most advanced LLMs on ViLLM-Eval revealed that even the best performing models have significant room for improvement in understanding and responding to Vietnamese language tasks. ViLLM-Eval is believed to be instrumental in identifying key strengths and weaknesses of foundation models, ultimately promoting…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

vlsp-2023-vllm/ViLLM-Eval
dataset· 187 dl
187 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques