Using crowdsourcing system for creating site-specific statistical machine translation engine
Alexander Kalinin, George Savchenko

TL;DR
This paper explores using crowdsourcing to gather domain-specific linguistic data for training tailored statistical machine translation engines, enhancing translation accuracy for specific sites.
Contribution
It introduces a method for collecting site-specific parallel corpora via crowdsourcing to improve domain-adapted machine translation models.
Findings
Crowdsourcing effectively gathers domain-specific translation data.
Site-specific corpora improve translation quality for targeted content.
The approach enables scalable customization of machine translation systems.
Abstract
A crowdsourcing translation approach is an effective tool for globalization of site content, but it is also an important source of parallel linguistic data. For the given site, processed with a crowdsourcing system, a sentence-aligned corpus can be fetched, which covers a very narrow domain of terminology and language patterns - a site-specific domain. These data can be used for training and estimation of site-specific statistical machine translation engine
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Semantic Web and Ontologies
