Tuna: Instruction Tuning using Feedback from Large Language Models

Haoran Li; Yiran Liu; Xingxing Zhang; Wei Lu; Furu Wei

arXiv:2310.13385·cs.CL·October 23, 2023·1 cites

Tuna: Instruction Tuning using Feedback from Large Language Models

Haoran Li, Yiran Liu, Xingxing Zhang, Wei Lu, Furu Wei

PDF

Open Access 1 Repo

TL;DR

Tuna is a novel instruction tuning method that leverages feedback from large language models using probabilistic and contextual ranking to improve response quality across multiple benchmarks.

Contribution

The paper introduces Tuna, a new approach for instruction tuning that uses ranking techniques to incorporate feedback from powerful LLMs, enhancing response quality.

Findings

01

Tuna outperforms several reinforcement learning baselines.

02

It improves performance on multiple benchmarks.

03

The method effectively incorporates LLM feedback.

Abstract

Instruction tuning of open-source large language models (LLMs) like LLaMA, using direct outputs from more powerful LLMs such as Instruct-GPT and GPT-4, has proven to be a cost-effective way to align model behaviors with human preferences. However, the instruction-tuned model has only seen one response per instruction, lacking the knowledge of potentially better responses. In this paper, we propose finetuning an instruction-tuned LLM using our novel \textit{probabilistic ranking} and \textit{contextual ranking} approaches to increase the likelihood of generating better responses. Probabilistic ranking enables the instruction-tuned model to inherit the relative rankings of high-quality and low-quality responses from the teacher LLM. On the other hand, learning with contextual ranking allows the model to refine its own response distribution using the contextual understanding ability of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

microsoft/lmops
jaxOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Machine Learning and Data Classification

MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Softmax · Position-Wise Feed-Forward Layer · Dense Connections · Residual Connection · Absolute Position Encodings · Adam · Byte Pair Encoding