How Does Quantization Affect Multilingual LLMs?

Kelly Marchisio; Saurabh Dash; Hongyu Chen; Dennis Aumiller; Ahmet; \"Ust\"un; Sara Hooker; Sebastian Ruder

arXiv:2407.03211·cs.CL·October 15, 2024·2 cites

How Does Quantization Affect Multilingual LLMs?

Kelly Marchisio, Saurabh Dash, Hongyu Chen, Dennis Aumiller, Ahmet, \"Ust\"un, Sara Hooker, Sebastian Ruder

PDF

Open Access 1 Video

TL;DR

This paper investigates how quantization impacts the performance of multilingual large language models across different languages and tasks, revealing significant discrepancies between automatic metrics and human evaluations.

Contribution

It provides a comprehensive analysis of quantization effects on multilingual LLMs, highlighting language disparities and the importance of human evaluation for accurate assessment.

Findings

01

Quantization causes more significant performance drops in non-Latin script languages.

02

Automatic metrics underestimate the true impact of quantization compared to human judgments.

03

Challenging tasks like mathematical reasoning are most affected by quantization.

Abstract

Quantization techniques are widely used to improve inference speed and deployment of large language models. While a wide body of work examines the impact of quantization on LLMs in English, none have evaluated across languages. We conduct a thorough analysis of quantized multilingual LLMs, focusing on performance across languages and at varying scales. We use automatic benchmarks, LLM-as-a-Judge, and human evaluation, finding that (1) harmful effects of quantization are apparent in human evaluation, which automatic metrics severely underestimate: a 1.7% average drop in Japanese across automatic tasks corresponds to a 16.0% drop reported by human evaluators on realistic prompts; (2) languages are disparately affected by quantization, with non-Latin script languages impacted worst; and (3) challenging tasks like mathematical reasoning degrade fastest. As the ability to serve low-compute…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

How Does Quantization Affect Multilingual LLMs?· underline

Taxonomy

TopicsNatural Language Processing Techniques · Text Readability and Simplification

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings