Social Meme-ing: Measuring Linguistic Variation in Memes
Naitian Zhou, David Jurgens, David Bamman

TL;DR
This paper introduces a computational approach to analyze social and linguistic variation in memes, using a large dataset from Reddit to reveal meaningful community-specific differences and patterns of meme evolution.
Contribution
It presents a novel multimodal clustering pipeline for memes and introduces the SemanticMemes dataset, enabling large-scale analysis of meme-based sociolinguistic variation.
Findings
Memes show significant social variation across Reddit communities.
Patterns of meme innovation align with trends in written language.
Large-scale dataset facilitates sociolinguistic analysis of multimodal content.
Abstract
Much work in the space of NLP has used computational methods to explore sociolinguistic variation in text. In this paper, we argue that memes, as multimodal forms of language comprised of visual templates and text, also exhibit meaningful social variation. We construct a computational pipeline to cluster individual instances of memes into templates and semantic variables, taking advantage of their multimodal structure in doing so. We apply this method to a large collection of meme images from Reddit and make available the resulting \textsc{SemanticMemes} dataset of 3.8M images clustered by their semantic function. We use these clusters to analyze linguistic variation in memes, discovering not only that socially meaningful variation in meme usage exists between subreddits, but that patterns of meme innovation and acculturation within these communities align with previous findings on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsDigital Communication and Language · Humor Studies and Applications · Language and cultural evolution
MethodsALIGN
