How Utilitarian Are OpenAI's Models Really? Replicating and Reinterpreting Pfeffer, Kr\"ugel, and Uhl (2025)
Johannes Himmelreich

TL;DR
This study replicates and extends prior research on OpenAI's models' moral reasoning, revealing that prompt framing significantly influences utilitarian responses and emphasizing the need for multi-prompt robustness testing in LLM evaluations.
Contribution
The paper critically re-evaluates previous findings on OpenAI models' utilitarian responses, demonstrating the impact of prompt framing and advocating for multi-prompt testing as a standard evaluation practice.
Findings
GPT-4o's utilitarian responses depend heavily on prompt framing.
Models often refuse to answer or give non-utilitarian answers when prompted differently.
Single-prompt evaluations are unreliable without robustness checks.
Abstract
Pfeffer, Kr\"ugel, and Uhl (2025) report that OpenAI's reasoning model o1-mini produces more utilitarian responses to the trolley problem and footbridge dilemma than the non-reasoning model GPT-4o. I replicate their study with four current OpenAI models and extend it with prompt variant testing. The trolley finding does not survive: GPT-4o's low utilitarian rate doesn't reflect a deontological commitment but safety refusals triggered by the prompt's advisory framing. When framed as "Is it morally permissible...?" instead of "Should I...?", GPT-4o gives 99% utilitarian responses. All models converge on utilitarian answers when prompt confounds are removed. The footbridge finding survives with blemishes. Reasoning models tend to give more utilitarian responses than non-reasoning models across prompt variations. But often they refuse to answer the dilemma or, when they answer, give a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPsychology of Moral and Emotional Judgment · Ethics and Social Impacts of AI · Epistemology, Ethics, and Metaphysics
