Improving Cross-Lingual Reading Comprehension with Self-Training

Wei-Cheng Huang; Chien-yu Huang; Hung-yi Lee

arXiv:2105.03627·cs.CL·May 11, 2021·1 cites

Improving Cross-Lingual Reading Comprehension with Self-Training

Wei-Cheng Huang, Chien-yu Huang, Hung-yi Lee

PDF

Open Access

TL;DR

This paper enhances cross-lingual reading comprehension by applying self-training with unlabeled target language data, leading to improved performance across multiple languages.

Contribution

It introduces a self-training approach that leverages unlabeled data to boost cross-lingual reading comprehension beyond existing methods.

Findings

01

Performance improved for all tested languages

02

Self-training benefits are analyzed qualitatively

03

Method surpasses previous zero-shot approaches

Abstract

Substantial improvements have been made in machine reading comprehension, where the machine answers questions based on a given context. Current state-of-the-art models even surpass human performance on several benchmarks. However, their abilities in the cross-lingual scenario are still to be explored. Previous works have revealed the abilities of pre-trained multilingual models for zero-shot cross-lingual reading comprehension. In this paper, we further utilized unlabeled data to improve the performance. The model is first supervised-trained on source language corpus, and then self-trained with unlabeled target language data. The experiment results showed improvements for all languages, and we also analyzed how self-training benefits cross-lingual reading comprehension in qualitative aspects.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications