Establishing Trustworthiness: Rethinking Tasks and Model Evaluation

Robert Litschko; Max M\"uller-Eberstein; Rob van der Goot; Leon Weber,; Barbara Plank

arXiv:2310.05442·cs.CL·October 24, 2023

Establishing Trustworthiness: Rethinking Tasks and Model Evaluation

Robert Litschko, Max M\"uller-Eberstein, Rob van der Goot, Leon Weber,, Barbara Plank

PDF

Open Access

TL;DR

This paper advocates for a holistic approach to evaluating NLP models, emphasizing trustworthiness and reliability over traditional task-specific metrics, especially in the context of large language models and real-world applications.

Contribution

It proposes a rethinking of NLP tasks and evaluation methods, emphasizing trustworthiness and holistic assessment for large language models.

Findings

01

Traditional task-based evaluation is insufficient for LLMs.

02

Multi-faceted evaluation protocols are recommended.

03

Trustworthiness should be central in NLP model assessment.

Abstract

Language understanding is a multi-faceted cognitive capability, which the Natural Language Processing (NLP) community has striven to model computationally for decades. Traditionally, facets of linguistic intelligence have been compartmentalized into tasks with specialized model architectures and corresponding evaluation protocols. With the advent of large language models (LLMs) the community has witnessed a dramatic shift towards general purpose, task-agnostic approaches powered by generative models. As a consequence, the traditional compartmentalized notion of language tasks is breaking down, followed by an increasing challenge for evaluation and analysis. At the same time, LLMs are being deployed in more real-world scenarios, including previously unforeseen zero-shot setups, increasing the need for trustworthy and reliable systems. Therefore, we argue that it is time to rethink what…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Explainable Artificial Intelligence (XAI) · Natural Language Processing Techniques