Wide Reflective Equilibrium in LLM Alignment: Bridging Moral Epistemology and AI Safety
Matthew Brophy

TL;DR
This paper proposes using the Method of Wide Reflective Equilibrium (MWRE), a moral epistemology framework, to improve the ethical robustness and procedural legitimacy of Large Language Model (LLM) alignment techniques like Constitutional AI.
Contribution
It introduces MWRE as a novel framework for analyzing and enhancing LLM alignment, emphasizing dynamic revision and procedural legitimacy over existing methods.
Findings
MWRE offers a more ethically grounded approach to LLM alignment.
Current methods like CAI resemble MWRE but lack its iterative revision process.
MWRE provides a heuristic for developing more ethically justifiable AI systems.
Abstract
As large language models (LLMs) become more powerful and pervasive across society, ensuring these systems are beneficial, safe, and aligned with human values is crucial. Current alignment techniques, like Constitutional AI (CAI), involve complex iterative processes. This paper argues that the Method of Wide Reflective Equilibrium (MWRE) -- a well-established coherentist moral methodology -- offers a uniquely apt framework for understanding current LLM alignment efforts. Moreover, this methodology can substantively augment these processes by providing concrete pathways for improving their dynamic revisability, procedural legitimacy, and overall ethical grounding. Together, these enhancements can help produce more robust and ethically defensible outcomes. MWRE, emphasizing the achievement of coherence between our considered moral judgments, guiding moral principles, and relevant…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsLaw, AI, and Intellectual Property · Artificial Intelligence in Law · Ethics and Social Impacts of AI
