Language Access: An Information Based Approach
Akshar Bharati, Vineet Chaitanya, Amba P. Kulkarni, Rajeev Sangal

TL;DR
The paper introduces Anusaaraka, a machine translation system that makes Indian languages accessible by preserving information through an image-based output, supporting reversibility and domain-specific modules.
Contribution
It presents a novel image-based translation approach for Indian languages, ensuring information preservation and reversibility, with implementations for five language pairs and domain-specific modules.
Findings
Anusaaraka supports five Indian language pairs to Hindi.
The system preserves information through substitutibility and reversibility.
Domain-specific modules improve translation quality in narrow areas.
Abstract
The anusaaraka system (a kind of machine translation system) makes text in one Indian language accessible through another Indian language. The machine presents an image of the source text in a language close to the target language. In the image, some constructions of the source language (which do not have equivalents in the target language) spill over to the output. Some special notation is also devised. Anusaarakas have been built from five pairs of languages: Telugu,Kannada, Marathi, Bengali and Punjabi to Hindi. They are available for use through Email servers. Anusaarkas follows the principle of substitutibility and reversibility of strings produced. This implies preservation of information while going from a source language to a target language. For narrow subject areas, specialized modules can be built by putting subject domain knowledge into the system, which produce good…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques
