SCOI: Syntax-augmented Coverage-based In-context Example Selection for Machine Translation
Chenming Tang, Zhixiang Wang, Yunfang Wu

TL;DR
This paper introduces SCOI, a novel method that uses syntactic and lexical coverage to select better in-context examples for machine translation, significantly improving translation quality with large language models.
Contribution
SCOI is the first approach to incorporate deep syntactic structure into in-context example selection for machine translation, enhancing performance over existing methods.
Findings
SCOI achieves the highest average COMET score among learning-free methods.
Combining syntactic and lexical coverage improves example selection quality.
Experiments on six translation directions demonstrate SCOI's effectiveness.
Abstract
In-context learning (ICL) greatly improves the performance of large language models (LLMs) on various down-stream tasks, where the improvement highly depends on the quality of demonstrations. In this work, we introduce syntactic knowledge to select better in-context examples for machine translation (MT). We propose a new strategy, namely Syntax-augmented COverage-based In-context example selection (SCOI), leveraging the deep syntactic structure beyond conventional word matching. Specifically, we measure the set-level syntactic coverage by computing the coverage of polynomial terms with the help of a simplified tree-to-polynomial algorithm, and lexical coverage using word overlap. Furthermore, we devise an alternate selection approach to combine both coverage measures, taking advantage of syntactic and lexical information. We conduct experiments with two multi-lingual LLMs on six…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Speech and dialogue systems
