The Moral Foundations Reddit Corpus

Jackson Trager; Alireza S. Ziabari; Elnaz Rahmati; Aida Mostafazadeh Davani; Preni Golazizian; Farzan Karimi-Malekabadi; Ali Omrani; Zhihe Li; Brendan Kennedy; Georgios Chochlakis; Nils Karl Reimer; Melissa Reyes; Kelsey Cheng; Mellow Wei; Christina Merrifield; Arta Khosravi; Evans Alvarez; Morteza Dehghani

arXiv:2208.05545·cs.CL·March 19, 2026·23 cites

The Moral Foundations Reddit Corpus

Jackson Trager, Alireza S. Ziabari, Elnaz Rahmati, Aida Mostafazadeh Davani, Preni Golazizian, Farzan Karimi-Malekabadi, Ali Omrani, Zhihe Li, Brendan Kennedy, Georgios Chochlakis, Nils Karl Reimer, Melissa Reyes, Kelsey Cheng, Mellow Wei, Christina Merrifield, Arta Khosravi

PDF

Open Access 1 Repo 1 Datasets

TL;DR

This paper introduces a large, hand-annotated Reddit comment dataset based on Moral Foundations Theory to improve computational understanding of moral sentiment, and evaluates various language models on this subjective task.

Contribution

It presents the first extensive Reddit moral sentiment corpus annotated with multiple moral categories, facilitating research in NLP and social sciences, and benchmarks LLM performance on this dataset.

Findings

01

LLMs lag behind fine-tuned encoders in moral sentiment tasks.

02

Human-annotated corpora remain essential for AI alignment evaluation.

03

The dataset enables better understanding of moral rhetoric in online discourse.

Abstract

Moral framing and sentiment can affect a variety of online and offline behaviors, including donation, environmental action, political engagement, and protest. Various computational methods in Natural Language Processing (NLP) have been used to detect moral sentiment from textual data, but achieving strong performance in such subjective tasks requires large, hand-annotated datasets. Previous corpora annotated for moral sentiment have proven valuable, and have generated new insights both within NLP and across the social sciences, but have been limited to Twitter. To facilitate improving our understanding of the role of moral rhetoric, we present the Moral Foundations Reddit Corpus, a collection of 16,123 English Reddit comments that have been curated from 12 distinct subreddits, hand-annotated by at least three trained annotators for 8 categories of moral sentiment (i.e., Care,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

LuanaBulla/Detection-of-Morality-in-Text
pytorch

Datasets

USC-MOLA-Lab/MFRC
dataset· 306 dl
306 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHate Speech and Cyberbullying Detection · Sentiment Analysis and Opinion Mining · Misinformation and Its Impacts