Polish - English Speech Statistical Machine Translation Systems for the   IWSLT 2013

Krzysztof Wo{\l}k; Krzysztof Marasek

arXiv:1509.09097·cs.CL·October 1, 2015

Polish - English Speech Statistical Machine Translation Systems for the IWSLT 2013

Krzysztof Wo{\l}k, Krzysztof Marasek

PDF

TL;DR

This paper investigates the impact of different training configurations on Polish-English speech translation systems using TED data, focusing on data preparation, morphological features, and evaluation metrics.

Contribution

It introduces a detailed analysis of Polish data for SMT, including morphological processing and data cleaning, to improve translation quality.

Findings

01

Morphological processing improves translation accuracy.

02

Data cleaning significantly enhances system performance.

03

Evaluation metrics show consistent improvements across experiments.

Abstract

This research explores the effects of various training settings from Polish to English Statistical Machine Translation system for spoken language. Various elements of the TED parallel text corpora for the IWSLT 2013 evaluation campaign were used as the basis for training of language models, and for development, tuning and testing of the translation system. The BLEU, NIST, METEOR and TER metrics were used to evaluate the effects of data preparations on translation results. Our experiments included systems, which use stems and morphological information on Polish words. We also conducted a deep analysis of provided Polish data as preparatory work for the automatic data correction and cleaning phase.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.