# The Uneven Impact of Post-Training Quantization in Machine Translation

**Authors:** Benjamin Marie, Atsushi Fujita

arXiv: 2508.20893 · 2025-08-29

## TL;DR

This paper evaluates how post-training quantization affects multilingual machine translation across various models and languages, revealing that lower-bit quantization degrades performance more for low-resource and diverse languages.

## Contribution

It provides the first large-scale analysis of PTQ effects on multilingual translation, comparing multiple techniques and highlighting the importance of model size and calibration strategies.

## Key findings

- 4-bit quantization preserves quality for high-resource languages
- 2-bit quantization significantly degrades low-resource language translation
- GGUF quantization offers the most consistent performance across settings

## Abstract

Quantization is essential for deploying large language models (LLMs) on resource-constrained hardware, but its implications for multilingual tasks remain underexplored. We conduct the first large-scale evaluation of post-training quantization (PTQ) on machine translation across 55 languages using five LLMs ranging from 1.7B to 70B parameters. Our analysis reveals that while 4-bit quantization often preserves translation quality for high-resource languages and large models, significant degradation occurs for low-resource and typologically diverse languages, particularly in 2-bit settings. We compare four quantization techniques (AWQ, BitsAndBytes, GGUF, and AutoRound), showing that algorithm choice and model size jointly determine robustness. GGUF variants provide the most consistent performance, even at 2-bit precision. Additionally, we quantify the interactions between quantization, decoding hyperparameters, and calibration languages, finding that language-matched calibration offers benefits primarily in low-bit scenarios. Our findings offer actionable insights for deploying multilingual LLMs for machine translation under quantization constraints, especially in low-resource settings.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/2508.20893/full.md

## References

35 references — full list in the complete paper: https://tomesphere.com/paper/2508.20893/full.md

---
Source: https://tomesphere.com/paper/2508.20893