GET-Tok: A GenAI-Enriched Multimodal TikTok Dataset Documenting the 2022 Attempted Coup in Peru
Gabriela Pinto, Keith Burghardt, Kristina Lerman, Emilio Ferrara

TL;DR
This paper introduces GET-Tok, a pipeline that enriches TikTok data with generative AI to analyze multimodal content, demonstrated on videos about Peru's 2022 attempted coup, enhancing understanding of online discussions.
Contribution
The paper presents a novel pipeline that combines TikTok data collection with generative AI augmentation, specifically for non-English social media content, which is a new approach.
Findings
Collected 43,697 videos about the Peru coup from Nov 2022 to Mar 2023.
Generated transcripts, descriptions, and stance information using AI models.
Provides a publicly available codebase for replicating the pipeline.
Abstract
TikTok is one of the largest and fastest-growing social media sites in the world. TikTok features, however, such as voice transcripts, are often missing and other important features, such as OCR or video descriptions, do not exist. We introduce the Generative AI Enriched TikTok (GET-Tok) data, a pipeline for collecting TikTok videos and enriched data by augmenting the TikTok Research API with generative AI models. As a case study, we collect videos about the attempted coup in Peru initiated by its former President, Pedro Castillo, and its accompanying protests. The data includes information on 43,697 videos published from November 20, 2022 to March 1, 2023 (102 days). Generative AI augments the collected data via transcripts of TikTok videos, text descriptions of what is shown in the videos, what text is displayed within the video, and the stances expressed in the video. Overall, this…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMisinformation and Its Impacts
