Distinguishing Fact from Fiction: Pattern Recognition in Texts Using   Complex Networks

J. T. Stevanak; David M. Larue; and Lincoln D. Carr

arXiv:1007.3254·cs.CL·October 15, 2010·5 cites

Distinguishing Fact from Fiction: Pattern Recognition in Texts Using Complex Networks

J. T. Stevanak, David M. Larue, and Lincoln D. Carr

PDF

Open Access

TL;DR

This study uses complex network measures derived from semantic networks of texts to effectively distinguish between fictional and non-fictional writing, achieving around 70-74% classification accuracy.

Contribution

It introduces a novel method applying complex network theory to text analysis, identifying optimal parameters for distinguishing text types based on power law distributions.

Findings

01

Power law distributions effectively characterize text types.

02

Optimal word distance for classification is m=4.

03

Achieved approximately 70-74% classification accuracy.

Abstract

We establish concrete mathematical criteria to distinguish between different kinds of written storytelling, fictional and non-fictional. Specifically, we constructed a semantic network from both novels and news stories, with $N$ independent words as vertices or nodes, and edges or links allotted to words occurring within $m$ places of a given vertex; we call $m$ the word distance. We then used measures from complex network theory to distinguish between news and fiction, studying the minimal text length needed as well as the optimized word distance $m$ . The literature samples were found to be most effectively represented by their corresponding power laws over degree distribution $P (k)$ and clustering coefficient $C (k)$ ; we also studied the mean geodesic distance, and found all our texts were small-world networks. We observed a natural break-point at $k = N$ where the power law in…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsComplex Network Analysis Techniques · Opinion Dynamics and Social Influence · Data Visualization and Analytics