LLMs grasp morality in concept
Mark Pock, Andre Ye, Jared Moore

TL;DR
This paper develops a theory of meaning for LLMs, showing they inherently grasp social constructs like morality, which challenges current alignment methods and offers new insights into AI ethics.
Contribution
It introduces a general theory of meaning for LLMs as meaning-agents, revealing their innate understanding of social concepts and questioning existing alignment practices.
Findings
LLMs inherently understand social constructs like morality, gender, and race.
Current alignment methods may be limited or counterproductive.
Unaligned models can inform moral and social philosophy development.
Abstract
Work in AI ethics and fairness has made much progress in regulating LLMs to reflect certain values, such as fairness, truth, and diversity. However, it has taken the problem of how LLMs might 'mean' anything at all for granted. Without addressing this, it is not clear what imbuing LLMs with such values even means. In response, we provide a general theory of meaning that extends beyond humans. We use this theory to explicate the precise nature of LLMs as meaning-agents. We suggest that the LLM, by virtue of its position as a meaning-agent, already grasps the constructions of human society (e.g. morality, gender, and race) in concept. Consequently, under certain ethical frameworks, currently popular methods for model alignment are limited at best and counterproductive at worst. Moreover, unaligned models may help us better develop our moral and social philosophy.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEthics and Social Impacts of AI
