Directed Replacement
Lauri Karttunen

TL;DR
This paper presents a family of directed replace operators for finite-state calculus that enable unambiguous, left-to-right string transformations, useful for parsing, tokenization, and text filtering.
Contribution
It introduces directed replace operators that produce unambiguous transducers and support complex text manipulations in finite-state systems.
Findings
Yields unambiguous transducers for single-string replacements
Supports insertion around matched strings with regular expressions
Enables deterministic parsing, tokenization, and filtering
Abstract
This paper introduces to the finite-state calculus a family of directed replace operators. In contrast to the simple replace expression, UPPER -> LOWER, defined in Karttunen (ACL-95), the new directed version, UPPER @-> LOWER, yields an unambiguous transducer if the lower language consists of a single string. It transduces the input string from left to right, making only the longest possible replacement at each point. A new type of replacement expression, UPPER @-> PREFIX ... SUFFIX, yields a transducer that inserts text around strings that are instances of UPPER. The symbol ... denotes the matching part of the input which itself remains unchanged. PREFIX and SUFFIX are regular expressions describing the insertions. Expressions of the type UPPER @-> PREFIX ... SUFFIX may be used to compose a deterministic parser for a ``local grammar'' in the sense of Gross (1989). Other useful…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · semigroups and automata theory · DNA and Biological Computing
