BLEnD: A Benchmark for LLMs on Everyday Knowledge in Diverse Cultures   and Languages

Junho Myung; Nayeon Lee; Yi Zhou; Jiho Jin; Rifki Afina Putri,; Dimosthenis Antypas; Hsuvas Borkakoty; Eunsu Kim; Carla Perez-Almendros,; Abinew Ali Ayele; V\'ictor Guti\'errez-Basulto; Yazm\'in Ib\'a\~nez-Garc\'ia,; Hwaran Lee; Shamsuddeen Hassan Muhammad; Kiwoong Park; Anar Sabuhi Rzayev,; Nina White; Seid Muhie Yimam; Mohammad Taher Pilehvar; Nedjma Ousidhoum; Jose; Camacho-Collados; Alice Oh

arXiv:2406.09948·cs.CL·January 17, 2025·5 cites

BLEnD: A Benchmark for LLMs on Everyday Knowledge in Diverse Cultures and Languages

Junho Myung, Nayeon Lee, Yi Zhou, Jiho Jin, Rifki Afina Putri,, Dimosthenis Antypas, Hsuvas Borkakoty, Eunsu Kim, Carla Perez-Almendros,, Abinew Ali Ayele, V\'ictor Guti\'errez-Basulto, Yazm\'in Ib\'a\~nez-Garc\'ia,, Hwaran Lee, Shamsuddeen Hassan Muhammad, Kiwoong Park

PDF

Open Access 1 Repo 2 Datasets 1 Video

TL;DR

BLEnD is a comprehensive benchmark designed to evaluate large language models' knowledge of everyday cultural facts across diverse regions and languages, highlighting disparities in model performance based on cultural and linguistic representation.

Contribution

This paper introduces BLEnD, a new benchmark with 52,600 questions across 16 countries and 13 languages, addressing the gap in evaluating LLMs' cultural knowledge beyond high-resource languages.

Findings

01

LLMs perform better on cultures with high online representation.

02

Performance varies significantly between languages and cultures.

03

Models perform better in English for low-resource languages.

Abstract

Large language models (LLMs) often lack culture-specific knowledge of daily life, especially across diverse regions and non-English languages. Existing benchmarks for evaluating LLMs' cultural sensitivities are limited to a single language or collected from online sources such as Wikipedia, which do not reflect the mundane everyday lifestyles of diverse regions. That is, information about the food people eat for their birthday celebrations, spices they typically use, musical instruments youngsters play, or the sports they practice in school is common cultural knowledge but uncommon in easily collected online sources, especially for underrepresented cultures. To address this issue, we introduce BLEnD, a hand-crafted benchmark designed to evaluate LLMs' everyday knowledge across diverse cultures and languages. BLEnD comprises 52.6k question-answer pairs from 16 countries/regions, in 13…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

nlee0212/blend
noneOfficial

Datasets

Videos

BLEnD: A Benchmark for LLMs on Everyday Knowledge in Diverse Cultures and Languages· slideslive

Taxonomy

TopicsLibrary Science and Information Systems

MethodsAttention Is All You Need · Residual Connection · Softmax · Layer Normalization · Byte Pair Encoding · Label Smoothing · Adam · Linear Layer · Multi-Head Attention · Position-Wise Feed-Forward Layer