Automatic Spanish Translation of the SQuAD Dataset for Multilingual Question Answering
Casimiro Pio Carrino, Marta R. Costa-juss\`a, Jos\'e A. R. Fonollosa

TL;DR
This paper presents a method to automatically translate the SQuAD dataset into Spanish, enabling the training of high-performing Spanish question answering systems and establishing new benchmarks for cross-lingual QA performance.
Contribution
The authors introduce the TAR method for translating SQuAD to Spanish and demonstrate its effectiveness by training state-of-the-art Spanish QA models using this dataset.
Findings
Achieved 68.1 F1 on Spanish MLQA benchmark
Achieved 77.6 F1 and 61.8 EM on Spanish XQuAD benchmark
First large-scale Spanish QA training dataset from SQuAD
Abstract
Recently, multilingual question answering became a crucial research topic, and it is receiving increased interest in the NLP community. However, the unavailability of large-scale datasets makes it challenging to train multilingual QA systems with performance comparable to the English ones. In this work, we develop the Translate Align Retrieve (TAR) method to automatically translate the Stanford Question Answering Dataset (SQuAD) v1.1 to Spanish. We then used this dataset to train Spanish QA systems by fine-tuning a Multilingual-BERT model. Finally, we evaluated our QA models with the recently proposed MLQA and XQuAD benchmarks for cross-lingual Extractive QA. Experimental results show that our models outperform the previous Multilingual-BERT baselines achieving the new state-of-the-art value of 68.1 F1 points on the Spanish MLQA corpus and 77.6 F1 and 61.8 Exact Match points on the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Speech and dialogue systems
