TL;DR
This paper introduces a bi-gram graph representation for text and corpus analysis, demonstrating its computational efficiency, versatility, and scalability for various semantic and corpus-level insights.
Contribution
It presents a novel bi-gram graph method for text analysis, highlighting its computational simplicity and broad applicability for large datasets.
Findings
Bi-gram graphs are computationally cheap to create.
The approach provides unique insights through graph attributes.
Scalable to large datasets with diverse use-cases.
Abstract
We propose a new approach to text semantic analysis and general corpus analysis using, as termed in this article, a "bi-gram graph" representation of a corpus. The different attributes derived from graph theory are measured and analyzed as unique insights or against other corpus graphs. We observe a vast domain of tools and algorithms that can be developed on top of the graph representation; creating such a graph proves to be computationally cheap, and much of the heavy lifting is achieved via basic graph calculations. Furthermore, we showcase the different use-cases for the bi-gram graphs and how scalable it proves to be when dealing with large datasets.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
