Revisit Input Perturbation Problems for LLMs: A Unified Robustness   Evaluation Framework for Noisy Slot Filling Task

Guanting Dong; Jinxu Zhao; Tingfeng Hui; Daichi Guo; Wenlong Wan; Boqi; Feng; Yueyan Qiu; Zhuoma Gongque; Keqing He; Zechen Wang; Weiran Xu

arXiv:2310.06504·cs.CL·October 11, 2023

Revisit Input Perturbation Problems for LLMs: A Unified Robustness Evaluation Framework for Noisy Slot Filling Task

Guanting Dong, Jinxu Zhao, Tingfeng Hui, Daichi Guo, Wenlong Wan, Boqi, Feng, Yueyan Qiu, Zhuoma Gongque, Keqing He, Zechen Wang, Weiran Xu

PDF

Open Access 1 Repo

TL;DR

This paper introduces a comprehensive framework and dataset for evaluating the robustness of large language models in noisy slot-filling tasks, revealing current models' limited robustness and suggesting future research directions.

Contribution

It proposes a unified robustness evaluation framework, constructs the Noise-LLM dataset with various perturbations, and designs data augmentation and prompt strategies for systematic assessment.

Findings

01

Open-source LLMs show limited robustness to input perturbations.

02

The Noise-LLM dataset covers diverse single and mixed perturbation types.

03

Experimental results highlight the need for improved robustness methods.

Abstract

With the increasing capabilities of large language models (LLMs), these high-performance models have achieved state-of-the-art results on a wide range of natural language processing (NLP) tasks. However, the models' performance on commonly-used benchmark datasets often fails to accurately reflect their reliability and robustness when applied to real-world noisy data. To address these challenges, we propose a unified robustness evaluation framework based on the slot-filling task to systematically evaluate the dialogue understanding capability of LLMs in diverse input perturbation scenarios. Specifically, we construct a input perturbation evaluation dataset, Noise-LLM, which contains five types of single perturbation and four types of mixed perturbation data. Furthermore, we utilize a multi-level data augmentation method (character, word, and sentence levels) to construct a candidate data…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

dongguanting/noise-slot-filling-llm
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Speech and dialogue systems