Using crowdsourcing system for creating site-specific statistical   machine translation engine

Alexander Kalinin; George Savchenko

arXiv:1409.5502·cs.CL·September 22, 2014

Using crowdsourcing system for creating site-specific statistical machine translation engine

Alexander Kalinin, George Savchenko

PDF

Open Access

TL;DR

This paper explores using crowdsourcing to gather domain-specific linguistic data for training tailored statistical machine translation engines, enhancing translation accuracy for specific sites.

Contribution

It introduces a method for collecting site-specific parallel corpora via crowdsourcing to improve domain-adapted machine translation models.

Findings

01

Crowdsourcing effectively gathers domain-specific translation data.

02

Site-specific corpora improve translation quality for targeted content.

03

The approach enables scalable customization of machine translation systems.

Abstract

A crowdsourcing translation approach is an effective tool for globalization of site content, but it is also an important source of parallel linguistic data. For the given site, processed with a crowdsourcing system, a sentence-aligned corpus can be fetched, which covers a very narrow domain of terminology and language patterns - a site-specific domain. These data can be used for training and estimation of site-specific statistical machine translation engine

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Semantic Web and Ontologies