# Approximating probabilistic models as weighted finite automata

**Authors:** Ananda Theertha Suresh, Brian Roark, Michael Riley, Vlad Schogol

arXiv: 1905.08701 · 2021-02-01

## TL;DR

This paper introduces an efficient algorithm to approximate various probabilistic models with weighted finite automata, minimizing divergence and enabling compact, versatile language modeling applications.

## Contribution

The paper presents a novel algorithm that approximates generic probabilistic models as weighted finite automata using divergence minimization, with practical applications demonstrated.

## Key findings

- Effective approximation of probabilistic models as WFA
- Applications in distilling n-gram models from neural models
- Open-source implementation available

## Abstract

Weighted finite automata (WFA) are often used to represent probabilistic models, such as $n$-gram language models, since they are efficient for recognition tasks in time and space. The probabilistic source to be represented as a WFA, however, may come in many forms. Given a generic probabilistic model over sequences, we propose an algorithm to approximate it as a weighted finite automaton such that the Kullback-Leiber divergence between the source model and the WFA target model is minimized. The proposed algorithm involves a counting step and a difference of convex optimization step, both of which can be performed efficiently. We demonstrate the usefulness of our approach on various tasks, including distilling $n$-gram models from neural models, building compact language models, and building open-vocabulary character models. The algorithms used for these experiments are available in an open-source software library.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1905.08701/full.md

## Figures

8 figures with captions in the complete paper: https://tomesphere.com/paper/1905.08701/full.md

## References

60 references — full list in the complete paper: https://tomesphere.com/paper/1905.08701/full.md

---
Source: https://tomesphere.com/paper/1905.08701