Evaluating the Robustness to Instructions of Large Language Models

Yuansheng Ni; Sichao Jiang; Xinyu wu; Hui Shen; Yuli Zhou

arXiv:2308.14306·cs.CL·November 28, 2023·2 cites

Evaluating the Robustness to Instructions of Large Language Models

Yuansheng Ni, Sichao Jiang, Xinyu wu, Hui Shen, Yuli Zhou

PDF

Open Access

TL;DR

This paper evaluates how instruction-tuned large language models perform and maintain robustness on seen and unseen tasks, especially relation extraction, revealing performance drops with unfamiliar instructions and size-dependent robustness patterns.

Contribution

It provides a comprehensive evaluation of multiple instruction-tuned LLMs on real-world tasks, highlighting robustness issues and size-related performance trends.

Findings

01

Performance drops on unseen instructions, especially for relation extraction.

02

Robustness to RE instructions is worse than to QA instructions.

03

Performance of FLAN-T5 improves with size up to 3B parameters.

Abstract

Recently, Instruction fine-tuning has risen to prominence as a potential method for enhancing the zero-shot capabilities of Large Language Models (LLMs) on novel tasks. This technique has shown an exceptional ability to boost the performance of moderately sized LLMs, sometimes even reaching performance levels comparable to those of much larger model variants. The focus is on the robustness of instruction-tuned LLMs to seen and unseen tasks. We conducted an exploration of six models including Alpaca, Vicuna, WizardLM, and Traditional Task-oriented Models(Flan-T5-XL/XXL, T0++) using real-world relation extraction datasets as case studies. We carried out a comprehensive evaluation of these instruction-following LLMs which have been tuned based on open-domain instructions and task-oriented instructions. The main discussion is their performance and robustness towards instructions. We have…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Text Readability and Simplification

MethodsFlan-T5 · Focus