ZusammenQA: Data Augmentation with Specialized Models for Cross-lingual Open-retrieval Question Answering System
Chia-Chien Hung, Tommaso Green, Robert Litschko, Tornike Tsereteli,, Sotaro Takeshita, Marco Bombieri, Goran Glava\v{s}, Simone Paolo Ponzetto

TL;DR
This paper presents ZusammenQA, a cross-lingual open-retrieval question answering system that leverages specialized models, data augmentation, and multilingual training to improve performance across multiple languages, especially low-resource ones.
Contribution
It introduces a novel combination of data augmentation, specialized passage retrieval, and answer generation models tailored for cross-lingual QA tasks.
Findings
Data augmentation improves low-resource language performance.
Specialized models enhance retrieval and answer quality.
Language- and domain-specific training benefits overall accuracy.
Abstract
This paper introduces our proposed system for the MIA Shared Task on Cross-lingual Open-retrieval Question Answering (COQA). In this challenging scenario, given an input question the system has to gather evidence documents from a multilingual pool and generate from them an answer in the language of the question. We devised several approaches combining different model variants for three main components: Data Augmentation, Passage Retrieval, and Answer Generation. For passage retrieval, we evaluated the monolingual BM25 ranker against the ensemble of re-rankers based on multilingual pretrained language models (PLMs) and also variants of the shared task baseline, re-training it from scratch using a recently introduced contrastive loss that maintains a strong gradient signal throughout training by means of mixed negative samples. For answer generation, we focused on language- and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Multimodal Machine Learning Applications · Natural Language Processing Techniques
