Evaluating the Robustness to Instructions of Large Language Models
Yuansheng Ni, Sichao Jiang, Xinyu wu, Hui Shen, Yuli Zhou

TL;DR
This paper evaluates how instruction-tuned large language models perform and maintain robustness on seen and unseen tasks, especially relation extraction, revealing performance drops with unfamiliar instructions and size-dependent robustness patterns.
Contribution
It provides a comprehensive evaluation of multiple instruction-tuned LLMs on real-world tasks, highlighting robustness issues and size-related performance trends.
Findings
Performance drops on unseen instructions, especially for relation extraction.
Robustness to RE instructions is worse than to QA instructions.
Performance of FLAN-T5 improves with size up to 3B parameters.
Abstract
Recently, Instruction fine-tuning has risen to prominence as a potential method for enhancing the zero-shot capabilities of Large Language Models (LLMs) on novel tasks. This technique has shown an exceptional ability to boost the performance of moderately sized LLMs, sometimes even reaching performance levels comparable to those of much larger model variants. The focus is on the robustness of instruction-tuned LLMs to seen and unseen tasks. We conducted an exploration of six models including Alpaca, Vicuna, WizardLM, and Traditional Task-oriented Models(Flan-T5-XL/XXL, T0++) using real-world relation extraction datasets as case studies. We carried out a comprehensive evaluation of these instruction-following LLMs which have been tuned based on open-domain instructions and task-oriented instructions. The main discussion is their performance and robustness towards instructions. We have…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Text Readability and Simplification
MethodsFlan-T5 · Focus
