UAST: Unicode Aware Sanskrit Transliteration
Dhruvil Dave, Aneri Dalwadi

TL;DR
This paper introduces UAST, a Unicode-aware transliteration scheme for Sanskrit that addresses encoding incompatibilities between Devanagari, IAST, and Unicode, facilitating better typesetting and script support.
Contribution
The paper presents a novel Unicode-aware transliteration scheme for Sanskrit that resolves encoding issues and supports multiple scripts, with open-source implementations provided.
Findings
Addresses fundamental Unicode encoding issues for Sanskrit transliteration
Provides open-source tools for multiple scripts
Improves compatibility between transliteration schemes and Unicode
Abstract
Devan\=agar\=i is the writing system that is adapted by various languages like Sanskrit. International Alphabet of Sanskrit Transliteration (IAST) is a transliteration scheme for romanisation of Sanskrit language. IAST makes use of diacritics to represent various characters. On a computer, these are represented using Unicode standard which differs from how the Sanskrit language behaves at a very fundamental level. This results in an issue that is encountered while designing typesetting software for devan\=agar\=i and IAST. We hereby discuss the problems and provide a solution that solves the issue of incompatibilities between various transliteration and encoding schemes. The base implementation that should be used is available at https://github.com/dhruvildave/uast.rs. Another implementation that extends UAST to around scripts is available at https://github.com/aneri0x4f/uast-cli…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Handwritten Text Recognition Techniques
