SaGE: Evaluating Moral Consistency in Large Language Models

Vamshi Krishna Bonagiri; Sreeram Vennam; Priyanshul Govil; Ponnurangam; Kumaraguru; Manas Gaur

arXiv:2402.13709·cs.CL·March 11, 2024·3 cites

SaGE: Evaluating Moral Consistency in Large Language Models

Vamshi Krishna Bonagiri, Sreeram Vennam, Priyanshul Govil, Ponnurangam, Kumaraguru, Manas Gaur

PDF

Open Access 1 Repo

TL;DR

This paper introduces SaGE, an information-theoretic measure to evaluate moral consistency in large language models, revealing their inconsistency in moral responses and emphasizing the need for separate assessment of accuracy and consistency.

Contribution

We propose SaGE, a novel measure based on Semantic Graph Entropy, and create the Moral Consistency Corpus to evaluate and analyze moral consistency in LLMs.

Findings

01

LLMs show significant moral inconsistency.

02

Task accuracy and moral consistency are independent.

03

SaGE effectively measures moral consistency across datasets.

Abstract

Despite recent advancements showcasing the impressive capabilities of Large Language Models (LLMs) in conversational systems, we show that even state-of-the-art LLMs are morally inconsistent in their generations, questioning their reliability (and trustworthiness in general). Prior works in LLM evaluation focus on developing ground-truth data to measure accuracy on specific tasks. However, for moral scenarios that often lack universally agreed-upon answers, consistency in model responses becomes crucial for their reliability. To address this issue, we propose an information-theoretic measure called Semantic Graph Entropy (SaGE), grounded in the concept of "Rules of Thumb" (RoTs) to measure a model's moral consistency. RoTs are abstract principles learned by a model and can help explain their decision-making strategies effectively. To this extent, we construct the Moral Consistency…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

vnnm404/SaGE
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling

MethodsFocus