Inducing Human-like Biases in Moral Reasoning Language Models

Artem Karpov; Seong Hah Cho; Austin Meek; Raymond Koopmanschap; Lucy; Farnik; Bogdan-Ionut Cirstea

arXiv:2411.15386·cs.AI·November 26, 2024

Inducing Human-like Biases in Moral Reasoning Language Models

Artem Karpov, Seong Hah Cho, Austin Meek, Raymond Koopmanschap, Lucy, Farnik, Bogdan-Ionut Cirstea

PDF

Open Access

TL;DR

This paper investigates whether fine-tuning large language models on human moral reasoning data, including brain imaging data, can enhance their alignment with human moral cognition, finding limited improvement in BrainScore.

Contribution

It explores the effect of fine-tuning LLMs on moral reasoning and brain data, revealing that larger models perform better but BrainScores do not significantly improve.

Findings

01

Larger models outperform smaller ones on moral reasoning tasks.

02

Fine-tuning on fMRI data does not significantly increase BrainScore.

03

Model accuracy improves with size, but alignment with brain data remains limited.

Abstract

In this work, we study the alignment (BrainScore) of large language models (LLMs) fine-tuned for moral reasoning on behavioral data and/or brain data of humans performing the same task. We also explore if fine-tuning several LLMs on the fMRI data of humans performing moral reasoning can improve the BrainScore. We fine-tune several LLMs (BERT, RoBERTa, DeBERTa) on moral reasoning behavioral data from the ETHICS benchmark [Hendrycks et al., 2020], on the moral reasoning fMRI data from Koster-Hale et al. [2013], or on both. We study both the accuracy on the ETHICS benchmark and the BrainScores between model activations and fMRI data. While larger models generally performed better on both metrics, BrainScores did not significantly improve after fine-tuning.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsPsychology of Moral and Emotional Judgment

MethodsAttention Is All You Need · Linear Layer · Refunds@Expedia|||How do I get a full refund from Expedia? · Linear Warmup With Linear Decay · Layer Normalization · Adam · Residual Connection · Weight Decay · Softmax · Multi-Head Attention