Atomic Literary Styling: Mechanistic Manipulation of Prose Generation in Neural Language Models
Tsogt-Ochir Enkhbayar

TL;DR
This paper investigates how individual neurons in GPT-2 relate to literary style, revealing that removing highly discriminative neurons unexpectedly enhances prose quality, challenging assumptions about neural causality.
Contribution
It identifies neurons associated with literary style in GPT-2 and demonstrates that ablating these neurons can improve generated prose, highlighting a gap between correlation and causality.
Findings
Discovered over 27,000 neurons discriminating literary style.
Ablating 50 top neurons improves prose style metrics by 25.7%.
Removing neurons can enhance, not degrade, text quality.
Abstract
We present a mechanistic analysis of literary style in GPT-2, identifying individual neurons that discriminate between exemplary prose and rigid AI-generated text. Using Herman Melville's Bartleby, the Scrivener as a corpus, we extract activation patterns from 355 million parameters across 32,768 neurons in late layers. We find 27,122 statistically significant discriminative neurons (), with effect sizes up to . Through systematic ablation studies, we discover a paradoxical result: while these neurons correlate with literary text during analysis, removing them often improves rather than degrades generated prose quality. Specifically, ablating 50 high-discriminating neurons yields a 25.7% improvement in literary style metrics. This demonstrates a critical gap between observational correlation and causal necessity in neural networks. Our findings challenge the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsArtificial Intelligence in Healthcare and Education · Topic Modeling · Mental Health via Writing
