BiRdQA: A Bilingual Dataset for Question Answering on Tricky Riddles
Yunxiang Zhang, Xiaojun Wan

TL;DR
BiRdQA is a bilingual dataset of riddles in English and Chinese designed to evaluate and improve machine understanding of figurative language and reasoning, revealing current models' limitations.
Contribution
The paper introduces BiRdQA, a large-scale bilingual riddle dataset with distractors, highlighting the challenges for existing QA models in solving tricky riddles.
Findings
Existing models perform poorly on BiRdQA
The dataset contains 15,365 riddles with distractors
BiRdQA reveals gaps in current natural language understanding
Abstract
A riddle is a question or statement with double or veiled meanings, followed by an unexpected answer. Solving riddle is a challenging task for both machine and human, testing the capability of understanding figurative, creative natural language and reasoning with commonsense knowledge. We introduce BiRdQA, a bilingual multiple-choice question answering dataset with 6614 English riddles and 8751 Chinese riddles. For each riddle-answer pair, we provide four distractors with additional information from Wikipedia. The distractors are automatically generated at scale with minimal bias. Existing monolingual and multilingual QA models fail to perform well on our dataset, indicating that there is a long way to go before machine can beat human on solving tricky riddles. The dataset has been released to the community.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications
