StoicLLM: Preference Optimization for Philosophical Alignment in Small Language Models
Ishmam Khan, Sindhuja Thogarrati, Shuo Zhang

TL;DR
This paper explores how small language models can be aligned with Stoic philosophy using preference optimization on limited data, achieving strong inward virtues but struggling with outward duties.
Contribution
It demonstrates that micro-dataset preference optimization can effectively instill inward Stoic virtues in small models, revealing limitations in modeling outward cosmopolitan duties.
Findings
300 high-quality examples induce strong inward Stoic virtues
Models nearly match few-shot prompting performance
Models fail to grasp outward-facing cosmopolitan duties
Abstract
While large language models excel at factual adaptation, their ability to internalize nuanced philosophical frameworks under severe data constraints remains underexplored. We investigate this by specializing small LLMs on micro-datasets of foundational Stoic texts using preference optimization (ORPO, AlphaPO). Evaluated via a multi-model critic bank, our results show that just 300 high-fidelity examples can induce strong alignment with inward-facing Stoic virtues, closely approaching few-shot prompting while freeing the context window. Critically, however, all models, including few-shot baselines, exhibit a persistent failure on Stoicism's outward-facing cosmopolitan duties, pointing to a representational limitation of small models that micro-dataset adaptation alone cannot overcome.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
