RETAIN: Interactive Tool for Regression Testing Guided LLM Migration

Tanay Dixit; Daniel Lee; Sally Fang; Sai Sree Harsha; Anirudh; Sureshan; Akash Maharaj; Yunyao Li

arXiv:2409.03928·cs.IR·September 9, 2024

RETAIN: Interactive Tool for Regression Testing Guided LLM Migration

Tanay Dixit, Daniel Lee, Sally Fang, Sai Sree Harsha, Anirudh, Sureshan, Akash Maharaj, Yunyao Li

PDF

Open Access 1 Video

TL;DR

RETAIN is an interactive tool designed to assist developers in regression testing during LLM migrations, helping identify errors and differences in model outputs more efficiently than manual methods.

Contribution

The paper introduces RETAIN, a novel regression testing tool with an interactive interface and error discovery module tailored for LLM migrations, improving error detection and prompt experimentation.

Findings

01

RETAIN enabled participants to identify twice as many errors as manual evaluation.

02

Participants could experiment with 75% more prompts using RETAIN.

03

RETAIN achieved 12% higher metric scores in a given time frame.

Abstract

Large Language Models (LLMs) are increasingly integrated into diverse applications. The rapid evolution of LLMs presents opportunities for developers to enhance applications continuously. However, this constant adaptation can also lead to performance regressions during model migrations. While several interactive tools have been proposed to streamline the complexity of prompt engineering, few address the specific requirements of regression testing for LLM Migrations. To bridge this gap, we introduce RETAIN (REgression Testing guided LLM migrAtIoN), a tool designed explicitly for regression testing in LLM Migrations. RETAIN comprises two key components: an interactive interface tailored to regression testing needs during LLM migrations, and an error discovery module that facilitates understanding of differences in model behaviors. The error discovery module generates textual descriptions…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

RETAIN: Interactive Tool for Regression Testing Guided LLM Migration· underline

Taxonomy

TopicsNatural Language Processing Techniques