Automatic Parallel Corpus Creation for Hindi-English News Translation   Task

Aditya Kumar Pathak; Priyankit Acharya; Dilpreet Kaur; Rakesh; Chandra Balabantaray

arXiv:1901.08625·cs.CL·January 28, 2019

Automatic Parallel Corpus Creation for Hindi-English News Translation Task

Aditya Kumar Pathak, Priyankit Acharya, Dilpreet Kaur, Rakesh, Chandra Balabantaray

PDF

Open Access

TL;DR

This paper presents an automatic system for generating Hindi-English parallel corpora specifically for news translation, addressing the scarcity of such resources and demonstrating promising quality through performance metrics.

Contribution

The work introduces a novel prototype system that automatically creates Hindi-English parallel corpora for news translation, filling a critical resource gap.

Findings

01

Generated corpus quality verified by performance metrics

02

Prototype system effectively creates parallel data

03

Addresses resource scarcity in Hindi-English news translation

Abstract

The parallel corpus for multilingual NLP tasks, deep learning applications like Statistical Machine Translation Systems is very important. The parallel corpus of Hindi-English language pair available for news translation task till date is of very limited size as per the requirement of the systems are concerned. In this work we have developed an automatic parallel corpus generation system prototype, which creates Hindi-English parallel corpus for news translation task. Further to verify the quality of generated parallel corpus we have experimented by taking various performance metrics and the results are quite interesting.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Handwritten Text Recognition Techniques