Local Translation Services for Neglected Languages
David Noever, Josh Kalin, Matt Ciolino, Dom Hambrick, and Gerry Dozier

TL;DR
This paper develops lightweight, local translation models for neglected and obfuscated languages, demonstrating effective translation of hacker-speak and reverse writing, and extending multilingual translation capabilities with minimal data.
Contribution
It introduces a deep learning architecture for translating hacker-speak and other obfuscated languages using small datasets, and generalizes to multiple languages including niche dialects.
Findings
Achieved fluent hacker-speak translation in under 50MB
Generated over a million bilingual sentence pairs for dataset augmentation
Ranked language proficiency with Italian highest and Mandarin lowest
Abstract
Taking advantage of computationally lightweight, but high-quality translators prompt consideration of new applications that address neglected languages. Locally run translators for less popular languages may assist data projects with protected or personal data that may require specific compliance checks before posting to a public translation API, but which could render reasonable, cost-effective solutions if done with an army of local, small-scale pair translators. Like handling a specialist's dialect, this research illustrates translating two historically interesting, but obfuscated languages: 1) hacker-speak ("l33t") and 2) reverse (or "mirror") writing as practiced by Leonardo da Vinci. The work generalizes a deep learning architecture to translatable variants of hacker-speak with lite, medium, and hard vocabularies. The original contribution highlights a fluent translator of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications
Methodstravel james
