Using Machine Translation to Localize Task Oriented NLG Output

Scott Roy; Cliff Brunk; Kyu-Young Kim; Justin Zhao; Markus Freitag,; Mihir Kale; Gagan Bansal; Sidharth Mudgal; Chris Varano

arXiv:2107.04512·cs.CL·July 12, 2021

Using Machine Translation to Localize Task Oriented NLG Output

Scott Roy, Cliff Brunk, Kyu-Young Kim, Justin Zhao, Markus Freitag,, Mihir Kale, Gagan Bansal, Sidharth Mudgal, Chris Varano

PDF

Open Access

TL;DR

This paper investigates using machine translation to localize task-oriented natural language outputs, addressing challenges of quality and domain specificity in multilingual virtual assistants.

Contribution

It introduces a novel approach combining finetuning, web data, semantic annotations, and error detection to improve translation quality for task-specific outputs.

Findings

01

Achieved near-perfection translation quality for task-specific outputs

02

Enhanced translation models with in-domain data and semantic information

03

Developed a scalable distillation model for deployment

Abstract

One of the challenges in a task oriented natural language application like the Google Assistant, Siri, or Alexa is to localize the output to many languages. This paper explores doing this by applying machine translation to the English output. Using machine translation is very scalable, as it can work with any English output and can handle dynamic text, but otherwise the problem is a poor fit. The required quality bar is close to perfection, the range of sentences is extremely narrow, and the sentences are often very different than the ones in the machine translation training data. This combination of requirements is novel in the field of domain adaptation for machine translation. We are able to reach the required quality bar by building on existing ideas and adding new ones: finetuning on in-domain translations, adding sentences from the Web, adding semantic annotations, and using…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Multimodal Machine Learning Applications