CC-Riddle: A Question Answering Dataset of Chinese Character Riddles

Fan Xu; Yunxiang Zhang; Xiaojun Wan

arXiv:2206.13778·cs.CL·September 26, 2023

CC-Riddle: A Question Answering Dataset of Chinese Character Riddles

Fan Xu, Yunxiang Zhang, Xiaojun Wan

PDF

Open Access 2 Repos

TL;DR

This paper introduces CC-Riddle, a comprehensive dataset of Chinese character riddles, and evaluates the ability of various language models to solve these riddles, revealing current limitations.

Contribution

The paper creates and releases a large Chinese character riddle dataset using web crawling, language models, and manual filtering, and assesses language models' performance on this task.

Findings

01

Language models struggle with solving Chinese character riddles.

02

The dataset includes both human-written and generated riddles.

03

Evaluation shows current models have limited success in this task.

Abstract

The Chinese character riddle is a unique form of cultural entertainment specific to the Chinese language. It typically comprises two parts: the riddle description and the solution. The solution to the riddle is a single character, while the riddle description primarily describes the glyph of the solution, occasionally supplemented with its explanation and pronunciation. Solving Chinese character riddles is a challenging task that demands understanding of character glyph, general knowledge, and a grasp of figurative language. In this paper, we construct a \textbf{C}hinese \textbf{C}haracter riddle dataset named CC-Riddle, which covers the majority of common simplified Chinese characters. The construction process is a combination of web crawling, language model generation and manual filtering. In generation stage, we input the Chinese phonetic alphabet, glyph and meaning of the solution…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Digital Humanities and Scholarship · Advanced Text Analysis Techniques