How Far Can Camels Go? Exploring the State of Instruction Tuning on Open   Resources

Yizhong Wang; Hamish Ivison; Pradeep Dasigi; Jack Hessel; Tushar Khot,; Khyathi Raghavi Chandu; David Wadden; Kelsey MacMillan; Noah A. Smith; Iz; Beltagy; Hannaneh Hajishirzi

arXiv:2306.04751·cs.CL·November 1, 2023·37 cites

How Far Can Camels Go? Exploring the State of Instruction Tuning on Open Resources

Yizhong Wang, Hamish Ivison, Pradeep Dasigi, Jack Hessel, Tushar Khot,, Khyathi Raghavi Chandu, David Wadden, Kelsey MacMillan, Noah A. Smith, Iz, Beltagy, Hannaneh Hajishirzi

PDF

Open Access 4 Repos 10 Models 3 Datasets 1 Video

TL;DR

This paper systematically evaluates instruction-tuned open models of various sizes on multiple tasks, revealing strengths, limitations, and the need for improved data and models to match proprietary systems.

Contribution

It provides a comprehensive evaluation framework and a new best-performing instruction-tuned model suite called Tulu, highlighting the impact of different datasets on model skills.

Findings

01

Different datasets enhance specific skills but no single dataset excels across all tasks.

02

Model and human preferences do not fully align with benchmark-based evaluations.

03

The best models reach 87% of ChatGPT and 73% of GPT-4 performance.

Abstract

In this work we explore recent advances in instruction-tuning language models on a range of open instruction-following datasets. Despite recent claims that open models can be on par with state-of-the-art proprietary models, these claims are often accompanied by limited evaluation, making it difficult to compare models across the board and determine the utility of various resources. We provide a large set of instruction-tuned models from 6.7B to 65B parameters in size, trained on 12 instruction datasets ranging from manually curated (e.g., OpenAssistant) to synthetic and distilled (e.g., Alpaca) and systematically evaluate them on their factual knowledge, reasoning, multilinguality, coding, and open-ended instruction following abilities through a collection of automatic, model-based, and human-based metrics. We further introduce T\"ulu, our best performing instruction-tuned model suite…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Models

Datasets

Videos

How Far Can Camels Go? Exploring the State of Instruction Tuning on Open Resources· slideslive

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Machine Learning and Data Classification

MethodsMulti-Head Attention · Attention Is All You Need · Dense Connections · Dropout · Byte Pair Encoding · Softmax · Layer Normalization · Position-Wise Feed-Forward Layer · Linear Layer · Absolute Position Encodings