Blow the Dog Whistle: A Chinese Dataset for Cant Understanding with   Common Sense and World Knowledge

Canwen Xu; Wangchunshu Zhou; Tao Ge; Ke Xu; Julian McAuley; and Furu Wei

arXiv:2104.02704·cs.CL·June 9, 2021·1 cites

Blow the Dog Whistle: A Chinese Dataset for Cant Understanding with Common Sense and World Knowledge

Canwen Xu, Wangchunshu Zhou, Tao Ge, Ke Xu, Julian McAuley, and Furu Wei

PDF

Open Access 1 Repo

TL;DR

This paper introduces a large Chinese dataset for cant understanding, highlighting its complexity and usefulness as a benchmark for evaluating deep language understanding, common sense, and world knowledge in pretrained models.

Contribution

The paper provides the first large Chinese cant dataset, formulates a cant understanding task, and analyzes model performance, advancing computational understanding of cant and related language phenomena.

Findings

01

Pretrained models struggle with cant understanding, indicating its complexity.

02

Cant understanding requires deep language, common sense, and world knowledge.

03

The dataset serves as a new benchmark for evaluating language models.

Abstract

Cant is important for understanding advertising, comedies and dog-whistle politics. However, computational research on cant is hindered by a lack of available datasets. In this paper, we propose a large and diverse Chinese dataset for creating and understanding cant from a computational linguistics perspective. We formulate a task for cant understanding and provide both quantitative and qualitative analysis for tested word embedding similarity and pretrained language models. Experiments suggest that such a task requires deep language understanding, common sense, and world knowledge and thus can be a good testbed for pretrained language models and help models perform better on other tasks. The code is available at https://github.com/JetRunner/dogwhistle. The data and leaderboard are available at https://competitions.codalab.org/competitions/30451.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

JetRunner/dogwhistle
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHumor Studies and Applications · Authorship Attribution and Profiling · Hate Speech and Cyberbullying Detection