Dissecting In-Context Learning of Translations in GPTs

Vikas Raunak; Hany Hassan Awadalla; Arul Menezes

arXiv:2310.15987·cs.CL·October 25, 2023·1 cites

Dissecting In-Context Learning of Translations in GPTs

Vikas Raunak, Hany Hassan Awadalla, Arul Menezes

PDF

Open Access

TL;DR

This paper investigates how different demonstration attributes affect GPT's in-context translation learning, revealing the importance of output distribution and proposing a method to enhance zero-shot translation performance.

Contribution

It uncovers the impact of source and target perturbations on translation quality and introduces Zero-Shot-Context to improve zero-shot GPT translation.

Findings

01

Target perturbation drastically reduces translation quality

02

Source perturbation has little impact on results

03

Zero-Shot-Context improves zero-shot translation, rivaling few-shot methods

Abstract

Most of the recent work in leveraging Large Language Models (LLMs) such as GPT-3 for Machine Translation (MT) has focused on selecting the few-shot samples for prompting. In this work, we try to better understand the role of demonstration attributes for the in-context learning of translations through perturbations of high-quality, in-domain demonstrations. We find that asymmetric perturbation of the source-target mappings yield vastly different results. We show that the perturbation of the source side has surprisingly little impact, while target perturbation can drastically reduce translation quality, suggesting that it is the output text distribution that provides the most important learning signal during in-context learning of translations. We propose a method named Zero-Shot-Context to add this signal automatically in Zero-Shot prompting. We demonstrate that it improves upon the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Speech Recognition and Synthesis

MethodsMulti-Head Attention · 15 Ways to Contact How can i speak to someone at Delta Airlines · Attention Is All You Need · Cosine Annealing · Byte Pair Encoding · Dropout · Weight Decay · Layer Normalization · Refunds@Expedia|||How do I get a full refund from Expedia? · Linear Warmup With Cosine Annealing