Morality in AI. A plea to embed morality in LLM architectures and frameworks
Gunter Bombaerts, Bram Delisse, Uzay Kaymak

TL;DR
This paper advocates for embedding moral understanding directly into the architecture of large language models using top-down design principles inspired by biological attention and philosophical theories, aiming to enhance ethical decision-making in AI.
Contribution
It introduces a novel framework that conceptualizes attention as a dynamic system mediating structure and processing, and proposes technical pathways for embedding morality into LLM architectures.
Findings
Attention as a dynamic mediating system
Pathways for embedding morality via training and architecture
Complementarity of architectural and external ethical methods
Abstract
Large language models (LLMs) increasingly mediate human decision-making and behaviour. Ensuring LLM processing of moral meaning therefore has become a critical challenge. Current approaches rely predominantly on bottom-up methods such as fine-tuning and reinforcement learning from human feedback. We propose a fundamentally different approach: embedding moral meaning processing directly into the architectural mechanisms and frameworks of transformer-based models through top-down design principles. We first sketch a framework that conceptualizes attention as a dynamic interface mediating between structure and processing, contrasting with existing linear attention frameworks in psychology. We start from established biological-artificial attention analogies in neural architecture design to improve cognitive processing. We extend this analysis to moral processing, using Iris Murdoch's theory…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEthics and Social Impacts of AI · Explainable Artificial Intelligence (XAI) · Embodied and Extended Cognition
