Evaluating Large Language Models with fmeval

Pola Schw\"obel; Luca Franceschi; Muhammad Bilal Zafar; Keerthan; Vasist; Aman Malhotra; Tomer Shenhar; Pinal Tailor; Pinar Yilmaz; Michael; Diamond; Michele Donini

arXiv:2407.12872·cs.CL·July 19, 2024

Evaluating Large Language Models with fmeval

Pola Schw\"obel, Luca Franceschi, Muhammad Bilal Zafar, Keerthan, Vasist, Aman Malhotra, Tomer Shenhar, Pinal Tailor, Pinar Yilmaz, Michael, Diamond, Michele Donini

PDF

Open Access 1 Repo

TL;DR

fmeval is an open source library designed to evaluate large language models across various tasks and responsible AI dimensions, emphasizing simplicity, coverage, extensibility, and performance.

Contribution

This paper introduces fmeval, a comprehensive evaluation library for LLMs, detailing its design principles and implementation, and demonstrating its practical use case.

Findings

01

Effective evaluation of LLMs across multiple tasks

02

Facilitates responsible AI assessment

03

Supports model selection for specific applications

Abstract

fmeval is an open source library to evaluate large language models (LLMs) in a range of tasks. It helps practitioners evaluate their model for task performance and along multiple responsible AI dimensions. This paper presents the library and exposes its underlying design principles: simplicity, coverage, extensibility and performance. We then present how these were implemented in the scientific and engineering choices taken when developing fmeval. A case study demonstrates a typical use case for the library: picking a suitable model for a question answering task. We close by discussing limitations and further work in the development of the library. fmeval can be found at https://github.com/aws/fmeval.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

aws/fmeval
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques

MethodsLib