CC-Riddle: A Question Answering Dataset of Chinese Character Riddles
Fan Xu, Yunxiang Zhang, Xiaojun Wan

TL;DR
This paper introduces CC-Riddle, a comprehensive dataset of Chinese character riddles, and evaluates the ability of various language models to solve these riddles, revealing current limitations.
Contribution
The paper creates and releases a large Chinese character riddle dataset using web crawling, language models, and manual filtering, and assesses language models' performance on this task.
Findings
Language models struggle with solving Chinese character riddles.
The dataset includes both human-written and generated riddles.
Evaluation shows current models have limited success in this task.
Abstract
The Chinese character riddle is a unique form of cultural entertainment specific to the Chinese language. It typically comprises two parts: the riddle description and the solution. The solution to the riddle is a single character, while the riddle description primarily describes the glyph of the solution, occasionally supplemented with its explanation and pronunciation. Solving Chinese character riddles is a challenging task that demands understanding of character glyph, general knowledge, and a grasp of figurative language. In this paper, we construct a \textbf{C}hinese \textbf{C}haracter riddle dataset named CC-Riddle, which covers the majority of common simplified Chinese characters. The construction process is a combination of web crawling, language model generation and manual filtering. In generation stage, we input the Chinese phonetic alphabet, glyph and meaning of the solution…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Digital Humanities and Scholarship · Advanced Text Analysis Techniques
