Tutorial: $\varphi$-Transductions in OpenFst via the Gallic Semiring
Marco Cognetta, Cyril Allauzen

TL;DR
This tutorial explains how to implement $$-transductions in OpenFst using the Gallic semiring, enabling correct usage of $$-transitions with practical code examples, demonstrated through the MaxMatch tokenization algorithm.
Contribution
It introduces a method to implement $$-transductions in OpenFst via the Gallic semiring, overcoming previous implementation constraints.
Findings
Successful implementation of $$-transductions using Gallic semiring
Demonstration with MaxMatch tokenization algorithm
Provision of self-contained code examples
Abstract
OpenFst, a popular finite-state transducer library, supports -transitions but, due to an implementation constraint, they cannot be used with transducers in a straightforward way. In this short tutorial, we describe how one can use other functionality provided by OpenFst (namely, the Gallic semiring) to correctly implement -transductions and demonstrate it by implementing the MaxMatch (WordPiece) tokenization algorithm (Devlin et al., 2019; Song et al., 2021). Accompanying self-contained code examples are provided. https://www.openfst.org/twiki/pub/Contrib/FstContrib/phi_transduction_tutorial_code.tgz
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
