Instruction Position Matters in Sequence Generation with Large Language   Models

Yijin Liu; Xianfeng Zeng; Fandong Meng; Jie Zhou

arXiv:2308.12097·cs.CL·August 24, 2023·1 cites

Instruction Position Matters in Sequence Generation with Large Language Models

Yijin Liu, Xianfeng Zeng, Fandong Meng, Jie Zhou

PDF

Open Access 1 Repo

TL;DR

This paper shows that repositioning task instructions after input sentences in training data enhances large language models' ability to follow instructions, especially in long sequences, improving zero-shot translation and summarization performance.

Contribution

The paper introduces a simple method of shifting instructions after input sentences to improve instruction-following in LLMs, backed by theoretical analysis and extensive experiments.

Findings

01

Improved zero-shot translation performance, up to 9.7 BLEU points.

02

Consistent gains across model scales and tasks.

03

No additional data or annotation required.

Abstract

Large language models (LLMs) are capable of performing conditional sequence generation tasks, such as translation or summarization, through instruction fine-tuning. The fine-tuning data is generally sequentially concatenated from a specific task instruction, an input sentence, and the corresponding response. Considering the locality modeled by the self-attention mechanism of LLMs, these models face the risk of instruction forgetting when generating responses for long input sentences. To mitigate this issue, we propose enhancing the instruction-following capability of LLMs by shifting the position of task instructions after the input sentences. Theoretical analysis suggests that our straightforward method can alter the model's learning focus, thereby emphasizing the training of instruction-following capabilities. Concurrently, experimental results demonstrate that our approach…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

adaxry/post-instruction
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Text Readability and Simplification