Metamorphic Evaluation of ChatGPT as a Recommender System

Madhurima Khirbat; Yongli Ren; Pablo Castells; Mark Sanderson

arXiv:2411.12121·cs.IR·November 20, 2024

Metamorphic Evaluation of ChatGPT as a Recommender System

Madhurima Khirbat, Yongli Ren, Pablo Castells, Mark Sanderson

PDF

Open Access

TL;DR

This paper introduces a metamorphic testing framework to evaluate ChatGPT-based recommender systems, revealing the need for specialized evaluation methods due to their probabilistic and black-box nature.

Contribution

It proposes a novel metamorphic testing approach for LLM-based recommenders, addressing the limitations of traditional evaluation metrics for these models.

Findings

01

Lower similarity scores indicate inconsistencies in GPT-based recommendations

02

Traditional metrics are insufficient for evaluating LLM-based recommender systems

03

Metamorphic testing reveals the need for comprehensive evaluation methods

Abstract

With the rise of Large Language Models (LLMs) such as ChatGPT, researchers have been working on how to utilize the LLMs for better recommendations. However, although LLMs exhibit black-box and probabilistic characteristics (meaning their internal working is not visible), the evaluation framework used for assessing these LLM-based recommender systems (RS) are the same as those used for traditional recommender systems. To address this gap, we introduce the metamorphic testing for the evaluation of GPT-based RS. This testing technique involves defining of metamorphic relations (MRs) between the inputs and checking if the relationship has been satisfied in the outputs. Specifically, we examined the MRs from both RS and LLMs perspectives, including rating multiplication/shifting in RS and adding spaces/randomness in the LLMs prompt via prompt perturbation. Similarity metrics (e.g. Kendall…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsArtificial Intelligence in Healthcare and Education · Machine Learning in Healthcare