Exploring Multilingual Probing in Large Language Models: A   Cross-Language Analysis

Daoyang Li; Haiyan Zhao; Qingcheng Zeng; Mengnan Du

arXiv:2409.14459·cs.CL·February 3, 2025

Exploring Multilingual Probing in Large Language Models: A Cross-Language Analysis

Daoyang Li, Haiyan Zhao, Qingcheng Zeng, Mengnan Du

PDF

Open Access 1 Video

TL;DR

This paper investigates how large language models perform across multiple languages, revealing disparities between high-resource and low-resource languages in probing accuracy, layer-wise trends, and representational similarities.

Contribution

It extends probing techniques to a multilingual setting, providing a comprehensive analysis of LLM behaviors across diverse languages and highlighting resource-based performance gaps.

Findings

01

High-resource languages outperform low-resource ones in probing accuracy.

02

Layer-wise accuracy trends differ between high-resource and low-resource languages.

03

High-resource languages show greater representational similarity than low-resource languages.

Abstract

Probing techniques for large language models (LLMs) have primarily focused on English, overlooking the vast majority of the world's languages. In this paper, we extend these probing methods to a multilingual context, investigating the behaviors of LLMs across diverse languages. We conduct experiments on several open-source LLM models, analyzing probing accuracy, trends across layers, and similarities between probing vectors for multiple languages. Our key findings reveal: (1) a consistent performance gap between high-resource and low-resource languages, with high-resource languages achieving significantly higher probing accuracy; (2) divergent layer-wise accuracy trends, where high-resource languages show substantial improvement in deeper layers similar to English; and (3) higher representational similarities among high-resource languages, with low-resource languages demonstrating lower…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Exploring Multilingual Probing in Large Language Models: A Cross-Language Analysis· underline

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling