MemeCMD: An Automatically Generated Chinese Multi-turn Dialogue Dataset with Contextually Retrieved Memes

Yuheng Wang; Xianhe Tang; Pufeng Huang

arXiv:2507.00891·cs.CL·July 2, 2025

MemeCMD: An Automatically Generated Chinese Multi-turn Dialogue Dataset with Contextually Retrieved Memes

Yuheng Wang, Xianhe Tang, Pufeng Huang

PDF

Open Access

TL;DR

MemeCMD is a large-scale Chinese multi-turn dialogue dataset that incorporates contextually relevant memes, created through automatic generation and retrieval methods to enhance multimodal conversational AI.

Contribution

The paper introduces MemeCMD, a novel automatically generated dataset combining dialogues with contextually retrieved memes for Chinese multi-turn conversations.

Findings

01

Effective retrieval framework for relevant memes

02

Diverse and contextually appropriate meme-incorporated dialogues

03

Scalable, privacy-preserving dataset for multimodal AI

Abstract

Memes are widely used in online social interactions, providing vivid, intuitive, and often humorous means to express intentions and emotions. Existing dialogue datasets are predominantly limited to either manually annotated or pure-text conversations, lacking the expressiveness and contextual nuance that multimodal interactions provide.To address these challenges, we introduce MemeCMD, an automatically generated Chinese Multi-turn Dialogue dataset with contextually retrieved memes. Our dataset combines a large-scale, MLLM-annotated meme library with dialogues auto-generated by dual agents across diverse scenarios. We introduce a retrieval framework and adaptive threshold to ensure contextually relevant, naturally spaced meme usage. Experiments demonstrate the effectiveness of our approach in generating contextually appropriate and diverse meme-incorporated dialogues, offering a scalable…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Sentiment Analysis and Opinion Mining · Multimodal Machine Learning Applications