Extracting Parallel Sentences with Bidirectional Recurrent Neural Networks to Improve Machine Translation
Francis Gr\'egoire, Philippe Langlais

TL;DR
This paper introduces a bidirectional RNN method for extracting parallel sentences from multilingual texts, improving machine translation without relying on feature engineering or external resources.
Contribution
The paper presents a novel bidirectional RNN approach for parallel sentence extraction that outperforms baselines and enhances translation quality.
Findings
Achieved promising extraction results from noisy corpora
Improved machine translation performance using extracted sentence pairs
Eliminated need for feature engineering or external resources
Abstract
Parallel sentence extraction is a task addressing the data sparsity problem found in multilingual natural language processing applications. We propose a bidirectional recurrent neural network based approach to extract parallel sentences from collections of multilingual texts. Our experiments with noisy parallel corpora show that we can achieve promising results against a competitive baseline by removing the need of specific feature engineering or additional external resources. To justify the utility of our approach, we extract sentence pairs from Wikipedia articles to train machine translation systems and show significant improvements in translation performance.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Text Readability and Simplification
