Tuna: Instruction Tuning using Feedback from Large Language Models
Haoran Li, Yiran Liu, Xingxing Zhang, Wei Lu, Furu Wei

TL;DR
Tuna is a novel instruction tuning method that leverages feedback from large language models using probabilistic and contextual ranking to improve response quality across multiple benchmarks.
Contribution
The paper introduces Tuna, a new approach for instruction tuning that uses ranking techniques to incorporate feedback from powerful LLMs, enhancing response quality.
Findings
Tuna outperforms several reinforcement learning baselines.
It improves performance on multiple benchmarks.
The method effectively incorporates LLM feedback.
Abstract
Instruction tuning of open-source large language models (LLMs) like LLaMA, using direct outputs from more powerful LLMs such as Instruct-GPT and GPT-4, has proven to be a cost-effective way to align model behaviors with human preferences. However, the instruction-tuned model has only seen one response per instruction, lacking the knowledge of potentially better responses. In this paper, we propose finetuning an instruction-tuned LLM using our novel \textit{probabilistic ranking} and \textit{contextual ranking} approaches to increase the likelihood of generating better responses. Probabilistic ranking enables the instruction-tuned model to inherit the relative rankings of high-quality and low-quality responses from the teacher LLM. On the other hand, learning with contextual ranking allows the model to refine its own response distribution using the contextual understanding ability of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Machine Learning and Data Classification
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Softmax · Position-Wise Feed-Forward Layer · Dense Connections · Residual Connection · Absolute Position Encodings · Adam · Byte Pair Encoding
