Scaling Laws for Moral Machine Judgment in Large Language Models
Kazuhiro Takemoto

TL;DR
This study demonstrates that larger language models systematically improve in moral judgment alignment with human preferences, following a predictable power-law relationship across diverse architectures.
Contribution
It provides the first empirical evidence of scaling laws governing moral judgment capabilities in large language models, extending scaling law research to value-based tasks.
Findings
Moral judgment alignment improves with model size following a power-law.
Extended reasoning models show better moral alignment, especially in smaller models.
Variance in moral judgment decreases at larger scales, indicating more reliable judgments.
Abstract
Autonomous systems increasingly require moral judgment capabilities, yet whether these capabilities scale predictably with model size remains unexplored. We systematically evaluate 75 large language model configurations (0.27B--1000B parameters) using the Moral Machine framework, measuring alignment with human preferences in life-death dilemmas. We observe a consistent power-law relationship with distance from human preferences () decreasing as (, ) where is model size. Mixed-effects models confirm this relationship persists after controlling for model family and reasoning capabilities. Extended reasoning models show significantly better alignment, with this effect being more pronounced in smaller models (sizereasoning interaction: ). The relationship holds across diverse architectures, while variance decreases at…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
