# On the number of k-skip-n-grams

**Authors:** Dmytro Krasnoshtan

arXiv: 1905.05407 · 2019-05-15

## TL;DR

This paper derives a mathematical formula to precisely count the number of k-skip-n-grams in a text corpus, which is useful for understanding their distribution and application in NLP tasks.

## Contribution

It provides a closed-form expression for the number of k-skip-n-grams, advancing the theoretical understanding of n-gram sampling methods in natural language processing.

## Key findings

- Derived a formula for counting k-skip-n-grams
- The formula accounts for corpus length and skip parameters
- Facilitates more accurate analysis of skip-gram models

## Abstract

The paper proves that the number of k-skip-n-grams for a corpus of size $L$ is $$\frac{Ln + n + k' - n^2 - nk'}{n} \cdot \binom{n-1+k'}{n-1}$$ where $k' = \min(L - n + 1, k)$.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1905.05407/full.md

## References

2 references — full list in the complete paper: https://tomesphere.com/paper/1905.05407/full.md

---
Source: https://tomesphere.com/paper/1905.05407