# Automatic Parallel Corpus Creation for Hindi-English News Translation   Task

**Authors:** Aditya Kumar Pathak, Priyankit Acharya, Dilpreet Kaur, Rakesh, Chandra Balabantaray

arXiv: 1901.08625 · 2019-01-28

## TL;DR

This paper presents an automatic system for generating Hindi-English parallel corpora specifically for news translation, addressing the scarcity of such resources and demonstrating promising quality through performance metrics.

## Contribution

The work introduces a novel prototype system that automatically creates Hindi-English parallel corpora for news translation, filling a critical resource gap.

## Key findings

- Generated corpus quality verified by performance metrics
- Prototype system effectively creates parallel data
- Addresses resource scarcity in Hindi-English news translation

## Abstract

The parallel corpus for multilingual NLP tasks, deep learning applications like Statistical Machine Translation Systems is very important. The parallel corpus of Hindi-English language pair available for news translation task till date is of very limited size as per the requirement of the systems are concerned. In this work we have developed an automatic parallel corpus generation system prototype, which creates Hindi-English parallel corpus for news translation task. Further to verify the quality of generated parallel corpus we have experimented by taking various performance metrics and the results are quite interesting.

---
Source: https://tomesphere.com/paper/1901.08625