Reduction of Intermediate Alphabets in Finite-State Transducer Cascades
Andre Kempe

TL;DR
This paper introduces an algorithm to reduce intermediate alphabets in cascades of finite-state transducers, decreasing their size without affecting overall functionality or runtime efficiency, demonstrated through NLP examples.
Contribution
The paper presents a novel algorithm that reduces intermediate alphabets in FST cascades without altering the overall relation or increasing runtime complexity.
Findings
Significant reduction in arcs and symbols in FSTs
No impact on overall relation or processing speed
Effective in NLP applications
Abstract
This article describes an algorithm for reducing the intermediate alphabets in cascades of finite-state transducers (FSTs). Although the method modifies the component FSTs, there is no change in the overall relation described by the whole cascade. No additional information or special algorithm, that could decelerate the processing of input, is required at runtime. Two examples from Natural Language Processing are used to illustrate the effect of the algorithm on the sizes of the FSTs and their alphabets. With some FSTs the number of arcs and symbols shrank considerably.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
Topicssemigroups and automata theory · Algorithms and Data Compression · Logic, programming, and type systems
