StoicLLM: Preference Optimization for Philosophical Alignment in Small Language Models

Ishmam Khan; Sindhuja Thogarrati; Shuo Zhang

arXiv:2605.11483·cs.CL·May 13, 2026

StoicLLM: Preference Optimization for Philosophical Alignment in Small Language Models

Ishmam Khan, Sindhuja Thogarrati, Shuo Zhang

PDF

TL;DR

This paper explores how small language models can be aligned with Stoic philosophy using preference optimization on limited data, achieving strong inward virtues but struggling with outward duties.

Contribution

It demonstrates that micro-dataset preference optimization can effectively instill inward Stoic virtues in small models, revealing limitations in modeling outward cosmopolitan duties.

Findings

01

300 high-quality examples induce strong inward Stoic virtues

02

Models nearly match few-shot prompting performance

03

Models fail to grasp outward-facing cosmopolitan duties

Abstract

While large language models excel at factual adaptation, their ability to internalize nuanced philosophical frameworks under severe data constraints remains underexplored. We investigate this by specializing small LLMs on micro-datasets of foundational Stoic texts using preference optimization (ORPO, AlphaPO). Evaluated via a multi-model critic bank, our results show that just 300 high-fidelity examples can induce strong alignment with inward-facing Stoic virtues, closely approaching few-shot prompting while freeing the context window. Critically, however, all models, including few-shot baselines, exhibit a persistent failure on Stoicism's outward-facing cosmopolitan duties, pointing to a representational limitation of small models that micro-dataset adaptation alone cannot overcome.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.