Efficient Calculation of Bigram Frequencies in a Corpus of Short Texts

Melvyn Drag; Gauthaman Vasudevan

arXiv:1604.05559·cs.CL·April 20, 2016

Efficient Calculation of Bigram Frequencies in a Corpus of Short Texts

Melvyn Drag, Gauthaman Vasudevan

PDF

Open Access

TL;DR

This paper introduces a simple, efficient method for accurately calculating bigram frequencies in short texts, addressing limitations of existing methods while maintaining similar computational complexity.

Contribution

The paper proposes a new exact counting method for bigram frequencies in short texts, improving accuracy over approximate methods without increasing computational complexity.

Findings

01

The new method provides exact bigram counts in short texts.

02

It matches the computational complexity of traditional methods.

03

It outperforms approximate methods in accuracy.

Abstract

We show that an efficient and popular method for calculating bigram frequencies is unsuitable for bodies of short texts and offer a simple alternative. Our method has the same computational complexity as the old method and offers an exact count instead of an approximation.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Algorithms and Data Compression · Topic Modeling