BIASEDTALES-ML: A Multilingual Dataset for Analyzing Narrative Attribute Distributions in LLM-Generated Stories
Yuxuan Ouyang, yingfeng luo, JingBo Zhu, Tong Xiao

TL;DR
This paper introduces BiasedTales-ML, a large multilingual dataset of children's stories generated by LLMs, and analyzes cross-lingual differences in narrative attributes to improve multilingual AI safety and alignment evaluation.
Contribution
It presents a new multilingual dataset and a framework for analyzing narrative attribute variations across languages, models, and social contexts.
Findings
Significant cross-lingual variability in narrative generation patterns.
English-centric evaluation may not generalize to other languages.
Structural narrative patterns differ across linguistic and cultural contexts.
Abstract
Large Language Models (LLMs) are increasingly used to generate narrative content, including children's stories, which play an important role in social and cultural learning. Despite growing interest in AI safety and alignment, most existing evaluations focus primarily on English, leaving the cross-lingual generalization of aligned behavior underexplored. In this work, we introduce BiasedTales-ML, a large-scale parallel corpus of approximately 350,000 children's stories generated across eight typologically and culturally diverse languages using a full-permutation prompting design. We propose a structured generator-extractor pipeline and a multi-dimensional distributional analysis framework to examine how narrative attributes vary across languages, models, and social conditions. Our analysis reveals substantial cross-lingual variability in narrative generation patterns, indicating that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
