Automatic Parallel Corpus Creation for Hindi-English News Translation Task
Aditya Kumar Pathak, Priyankit Acharya, Dilpreet Kaur, Rakesh, Chandra Balabantaray

TL;DR
This paper presents an automatic system for generating Hindi-English parallel corpora specifically for news translation, addressing the scarcity of such resources and demonstrating promising quality through performance metrics.
Contribution
The work introduces a novel prototype system that automatically creates Hindi-English parallel corpora for news translation, filling a critical resource gap.
Findings
Generated corpus quality verified by performance metrics
Prototype system effectively creates parallel data
Addresses resource scarcity in Hindi-English news translation
Abstract
The parallel corpus for multilingual NLP tasks, deep learning applications like Statistical Machine Translation Systems is very important. The parallel corpus of Hindi-English language pair available for news translation task till date is of very limited size as per the requirement of the systems are concerned. In this work we have developed an automatic parallel corpus generation system prototype, which creates Hindi-English parallel corpus for news translation task. Further to verify the quality of generated parallel corpus we have experimented by taking various performance metrics and the results are quite interesting.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Handwritten Text Recognition Techniques
