Distinguishing Fact from Fiction: Pattern Recognition in Texts Using Complex Networks
J. T. Stevanak, David M. Larue, and Lincoln D. Carr

TL;DR
This study uses complex network measures derived from semantic networks of texts to effectively distinguish between fictional and non-fictional writing, achieving around 70-74% classification accuracy.
Contribution
It introduces a novel method applying complex network theory to text analysis, identifying optimal parameters for distinguishing text types based on power law distributions.
Findings
Power law distributions effectively characterize text types.
Optimal word distance for classification is m=4.
Achieved approximately 70-74% classification accuracy.
Abstract
We establish concrete mathematical criteria to distinguish between different kinds of written storytelling, fictional and non-fictional. Specifically, we constructed a semantic network from both novels and news stories, with independent words as vertices or nodes, and edges or links allotted to words occurring within places of a given vertex; we call the word distance. We then used measures from complex network theory to distinguish between news and fiction, studying the minimal text length needed as well as the optimized word distance . The literature samples were found to be most effectively represented by their corresponding power laws over degree distribution and clustering coefficient ; we also studied the mean geodesic distance, and found all our texts were small-world networks. We observed a natural break-point at where the power law in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComplex Network Analysis Techniques · Opinion Dynamics and Social Influence · Data Visualization and Analytics
