Efficient Calculation of Bigram Frequencies in a Corpus of Short Texts
Melvyn Drag, Gauthaman Vasudevan

TL;DR
This paper introduces a simple, efficient method for accurately calculating bigram frequencies in short texts, addressing limitations of existing methods while maintaining similar computational complexity.
Contribution
The paper proposes a new exact counting method for bigram frequencies in short texts, improving accuracy over approximate methods without increasing computational complexity.
Findings
The new method provides exact bigram counts in short texts.
It matches the computational complexity of traditional methods.
It outperforms approximate methods in accuracy.
Abstract
We show that an efficient and popular method for calculating bigram frequencies is unsuitable for bodies of short texts and offer a simple alternative. Our method has the same computational complexity as the old method and offers an exact count instead of an approximation.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Algorithms and Data Compression · Topic Modeling
