Multilingual Contrastive Decoding via Language-Agnostic Layers Skipping

Wenhao Zhu; Sizhe Liu; Shujian Huang; Shuaijie She; Chris Wendler,; Jiajun Chen

arXiv:2407.10795·cs.CL·July 16, 2024

Multilingual Contrastive Decoding via Language-Agnostic Layers Skipping

Wenhao Zhu, Sizhe Liu, Shujian Huang, Shuaijie She, Chris Wendler,, Jiajun Chen

PDF

Open Access 1 Repo

TL;DR

This paper introduces a multilingual contrastive decoding method that improves large language models' reasoning accuracy across 11 languages by skipping language-agnostic layers, addressing previous issues with non-English tasks.

Contribution

It proposes a novel contrastive decoding algorithm with layer skipping strategies to enhance multilingual LLM performance, especially in reasoning tasks.

Findings

01

Outperforms previous contrastive decoding baselines.

02

Significantly improves reasoning accuracy across 11 languages.

03

Effective for diverse languages beyond English.

Abstract

Decoding by contrasting layers (DoLa), is designed to improve the generation quality of large language models (LLMs) by contrasting the prediction probabilities between an early exit output (amateur logits) and the final output (expert logits). However, we find that this approach does not work well on non-English tasks. Inspired by previous interpretability work on language transition during the model's forward pass, we discover that this issue arises from a language mismatch between early exit output and final output. In this work, we propose an improved contrastive decoding algorithm that is effective for diverse languages beyond English. To obtain more helpful amateur logits, we devise two strategies to skip a set of bottom, language-agnostic layers based on our preliminary analysis. Experimental results on multilingual reasoning benchmarks demonstrate that our proposed method…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

njunlp/skiplayercd
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Speech Recognition and Synthesis · Algorithms and Data Compression

MethodsSparse Evolutionary Training