Polish to English Statistical Machine Translation

Krzysztof Wo{\l}k

arXiv:1510.00001·cs.CL·October 2, 2015·1 cites

Polish to English Statistical Machine Translation

Krzysztof Wo{\l}k

PDF

Open Access

TL;DR

This paper investigates how different training configurations and data sources impact the performance of a Polish to English statistical machine translation system for spoken language, using multiple evaluation metrics.

Contribution

It presents an analysis of training settings and data preparation effects on translation quality for Polish-English SMT systems.

Findings

01

Data quality and preparation significantly influence translation accuracy.

02

Different corpora sources yield varying translation performance.

03

Evaluation metrics show consistent trends across experiments.

Abstract

This research explores the effects of various training settings on a Polish to English Statistical Machine Translation system for spoken language. Various elements of the TED, Europarl, and OPUS parallel text corpora were used as the basis for training of language models, for development, tuning and testing of the translation system. The BLEU, NIST, METEOR and TER metrics were used to evaluate the effects of the data preparations on the translation results.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Biomedical Text Mining and Ontologies