Hallucination or Creativity: How to Evaluate AI-Generated Scientific Stories?
Alex Argese, Pasquale Lisena, Rapha\"el Troncy

TL;DR
This paper introduces StoryScore, a comprehensive metric designed to evaluate AI-generated scientific stories by capturing semantic, structural, and factual aspects, addressing challenges in assessing creativity and hallucinations.
Contribution
The work presents a novel composite metric, StoryScore, that combines multiple evaluation dimensions to better assess AI-generated scientific narratives, especially in creative and factual accuracy aspects.
Findings
StoryScore effectively evaluates semantic alignment and factual hallucinations.
Traditional detection methods often misclassify creative storytelling as hallucinations.
Automatic metrics struggle to assess narrative control and pedagogical creativity.
Abstract
Generative AI can turn scientific articles into narratives for diverse audiences, but evaluating these stories remains challenging. Storytelling demands abstraction, simplification, and pedagogical creativity-qualities that are not often well-captured by standard summarization metrics. Meanwhile, factual hallucinations are critical in scientific contexts, yet, detectors often misclassify legitimate narrative reformulations or prove unstable when creativity is involved. In this work, we propose StoryScore, a composite metric for evaluating AI-generated scientific stories. StoryScore integrates semantic alignment, lexical grounding, narrative control, structural fidelity, redundancy avoidance, and entity-level hallucination detection into a unified framework. Our analysis also reveals why many hallucination detection methods fail to distinguish pedagogical creativity from factual errors,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Mental Health via Writing · Artificial Intelligence in Games
