Compositional Generalization in Multilingual Semantic Parsing over Wikidata
Ruixiang Cui, Rahul Aralikatte, Heather Lent, Daniel Hershcovich

TL;DR
This paper introduces a multilingual dataset for semantic parsing over Wikidata, analyzing how well current models generalize across languages and highlighting challenges in zero-shot cross-lingual transfer.
Contribution
The paper presents a new multilingual dataset, MCWQ, and provides an analysis of compositional generalization in semantic parsing across multiple languages, revealing cross-lingual transfer limitations.
Findings
Within-language generalization is comparable across languages.
Zero-shot cross-lingual transfer fails with current models.
The dataset enables more realistic semantic parsing research.
Abstract
Semantic parsing (SP) allows humans to leverage vast knowledge resources through natural interaction. However, parsers are mostly designed for and evaluated on English resources, such as CFQ (Keysers et al., 2020), the current standard benchmark based on English data generated from grammar rules and oriented towards Freebase, an outdated knowledge base. We propose a method for creating a multilingual, parallel dataset of question-query pairs, grounded in Wikidata. We introduce such a dataset, which we call Multilingual Compositional Wikidata Questions (MCWQ), and use it to analyze the compositional generalization of semantic parsers in Hebrew, Kannada, Chinese and English. While within-language generalization is comparable across languages, experiments on zero-shot cross-lingual transfer demonstrate that cross-lingual compositional generalization fails, even with state-of-the-art…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Text Readability and Simplification
