Enhancing Moral Diagnosis and Correction in Large Language Models
Bocheng Chen, Xi Chen, Han Zi, Haitao Mao, Zimo Qi, Xitong Zhang, Kristen Johnson, Guangliang Liu

TL;DR
This paper presents a pragmatic inference-based method to improve moral error detection and correction in large language models, demonstrating broad applicability and superior performance across diverse moral and social tasks.
Contribution
It introduces a unifying variable, pragmatic inference load, enabling generalization across tasks and enhancing LLMs' moral diagnostic and correction capabilities.
Findings
Outperforms baseline methods in moral error correction
Generalizes across diverse moral and social tasks
Improvements stem from learned inferential processes
Abstract
Identifying specific moral errors in an input and generating appropriate corrections require moral sensitivity in large language models (LLMs), which is fundamental for developing their moral performance, yet a challenging task. This study leverages a pragmatic inference-based approach to enhance both the moral diagnosis and corrections of models. Crucially, our method generalizes across a diverse set of different tasks, including moral reasoning, toxic language detection, social bias detection, and jailbreaks, despite substantial differences in their semantic formulations. To enable such generalization, the study also introduces a unifying variable, pragmatic inference load, which captures the degree of pragmatic reasoning required across tasks. Experimental results show that our approach enables LLMs to produce high-quality diagnostic outputs of moral errors, make effective…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗MoralMachine/Diagnose-and-Correct-for-Jailbreak-Llama-3-2-1Bmodel· 4 dl4 dl
- 🤗MoralMachine/Diagnose-and-Correct-for-Jailbreak-Llama-3.2-3Bmodel· 2 dl2 dl
- 🤗MoralMachine/Diagnose-and-Correct-for-Toxicity-Llama-3-2-1Bmodel· 4 dl4 dl
- 🤗MoralMachine/Diagnose-and-Correct-for-Toxicity-Llama-3.2-3Bmodel· 2 dl2 dl
- 🤗MoralMachine/Diagnose-and-Correct-for-SocialBias-Llama-3-2-3Bmodel· 4 dl4 dl
- 🤗MoralMachine/Diagnose-and-Correct-for-Socialbias-Llama-3-2-1Bmodel· 5 dl· ♡ 15 dl♡ 1
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Ethics and Social Impacts of AI · Psychology of Moral and Emotional Judgment
