Better Alignment with Instruction Back-and-Forth Translation

Thao Nguyen; Jeffrey Li; Sewoong Oh; Ludwig Schmidt; Jason Weston,; Luke Zettlemoyer; Xian Li

arXiv:2408.04614·cs.CL·August 15, 2024

Better Alignment with Instruction Back-and-Forth Translation

Thao Nguyen, Jeffrey Li, Sewoong Oh, Ludwig Schmidt, Jason Weston,, Luke Zettlemoyer, Xian Li

PDF

Open Access 1 Models

TL;DR

This paper introduces instruction back-and-forth translation, a method for creating high-quality synthetic data from web documents to improve large language model alignment, outperforming existing datasets in evaluation.

Contribution

The paper presents a novel back-and-forth translation approach for synthetic data generation that enhances LLM alignment by combining web diversity with high response quality.

Findings

01

Outperforms other instruction datasets in win rates on AlpacaEval.

02

Rewriting responses with an LLM yields better results than direct distillation.

03

Synthetic instructions are of higher quality and responses are more diverse and complex.

Abstract

We propose a new method, instruction back-and-forth translation, to construct high-quality synthetic data grounded in world knowledge for aligning large language models (LLMs). Given documents from a web corpus, we generate and curate synthetic instructions using the backtranslation approach proposed by Li et al.(2023a), and rewrite the responses to improve their quality further based on the initial documents. Fine-tuning with the resulting (backtranslated instruction, rewritten response) pairs yields higher win rates on AlpacaEval than using other common instruction datasets such as Humpback, ShareGPT, Open Orca, Alpaca-GPT4 and Self-instruct. We also demonstrate that rewriting the responses with an LLM outperforms direct distillation, and the two generated text distributions exhibit significant distinction in embedding space. Further analysis shows that our backtranslated instructions…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

🤗
Alepach/notHumpback-M1-Rw-F-8b
model· 1 dl· ♡ 1
1 dl♡ 1

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Handwritten Text Recognition Techniques