OMoS-QA: A Dataset for Cross-Lingual Extractive Question Answering in a German Migration Context
Steffen Kleinle, Jakob Prange, Annemarie Friedrich

TL;DR
This paper introduces OMoS-QA, a bilingual dataset for extractive question answering in German and English related to immigration, and evaluates five pretrained LLMs on this task, highlighting their strengths and limitations.
Contribution
The creation of a new bilingual dataset for cross-lingual extractive QA in a migration context and a comparative analysis of five pretrained LLMs' performance.
Findings
High precision and low-to-mid recall across models and languages.
Performance remains stable when question and document languages differ.
Larger differences observed in unanswerable question detection between languages.
Abstract
When immigrating to a new country, it is easy to feel overwhelmed by the need to obtain information on financial support, housing, schooling, language courses, and other issues. If relocation is rushed or even forced, the necessity for high-quality answers to such questions is all the more urgent. Official immigration counselors are usually overbooked, and online systems could guide newcomers to the requested information or a suitable counseling service. To this end, we present OMoS-QA, a dataset of German and English questions paired with relevant trustworthy documents and manually annotated answers, specifically tailored to this scenario. Questions are automatically generated with an open-source large language model (LLM) and answer sentences are selected by crowd workers with high agreement. With our data, we conduct a comparison of 5 pretrained LLMs on the task of extractive…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling
Methodstravel james
