Explainable Ethical Assessment on Human Behaviors by Generating Conflicting Social Norms
Yuxi Sun, Wei Gao, Hongzhan Lin, Jing Ma, Wenxuan Zhang

TL;DR
This paper introduces ClarityEthic, a novel approach that enhances AI's ability to assess and explain human behaviors by generating conflicting social norms, improving moral reasoning and interpretability.
Contribution
The paper presents ClarityEthic, a new method that generates conflicting social norms to improve ethical assessment and explanation in language models.
Findings
Outperforms baseline approaches in valence prediction.
Generated norms are plausible and enhance interpretability.
Human evaluations confirm the quality of explanations.
Abstract
Human behaviors are often guided or constrained by social norms, which are defined as shared, commonsense rules. For example, underlying an action ``\textit{report a witnessed crime}" are social norms that inform our conduct, such as ``\textit{It is expected to be brave to report crimes}''. Current AI systems that assess valence (i.e., support or oppose) of human actions by leveraging large-scale data training not grounded on explicit norms may be difficult to explain, and thus untrustworthy. Emulating human assessors by considering social norms can help AI models better understand and predict valence. While multiple norms come into play, conflicting norms can create tension and directly influence human behavior. For example, when deciding whether to ``\textit{report a witnessed crime}'', one may balance \textit{bravery} against \textit{self-protection}. In this paper, we introduce…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Ethics and Social Impacts of AI · Multimodal Machine Learning Applications
