Do Language Models Understand Morality? Towards a Robust Detection of Moral Content
Luana Bulla, Aldo Gangemi, Misael Mongiov\`i

TL;DR
This paper explores the use of large language models, especially GPT 3.5, as zero-shot classifiers for detecting moral values in text, comparing their performance with supervised models across different domains.
Contribution
It introduces the Davinci model as a state-of-the-art zero-shot moral value detector and provides a comprehensive comparison with supervised approaches for cross-domain robustness.
Findings
GPT 3.5 achieves competitive results in moral detection without training data
Unsupervised NLI-based models perform comparably to larger models
Supervised models struggle with cross-domain generalization
Abstract
The task of detecting moral values in text has significant implications in various fields, including natural language processing, social sciences, and ethical decision-making. Previously proposed supervised models often suffer from overfitting, leading to hyper-specialized moral classifiers that struggle to perform well on data from different domains. To address this issue, we introduce novel systems that leverage abstract concepts and common-sense knowledge acquired from Large Language Models and Natural Language Inference models during previous stages of training on multiple data sources. By doing so, we aim to develop versatile and robust methods for detecting moral values in real-world scenarios. Our approach uses the GPT 3.5 model as a zero-shot ready-made unsupervised multi-label classifier for moral values detection, eliminating the need for explicit training on labeled data. We…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPsychology of Moral and Emotional Judgment · Hate Speech and Cyberbullying Detection
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Linear Layer · Byte Pair Encoding · Adam · Residual Connection · Multi-Head Attention · Dropout · Dense Connections · Cosine Annealing
