TeraGram: A Structured Longitudinal Dataset of the Telegram Messenger
Anastasia Golovin, Sebastian B. Mohr, Arne I. Gottwald, Ulrik Hvid, Srushhti Trivedi, Joao Pinheiro Neto, Andreas C. Schneider, Viola Priesemann

TL;DR
TeraGram is a comprehensive longitudinal dataset of over 5.9 billion public Telegram messages from 2015 to 2025, enabling diverse social media research across languages and communities.
Contribution
It provides a large, multi-language, longitudinal dataset of Telegram content with rich metadata, facilitating studies without algorithmic influence.
Findings
Enables cross-lingual and community comparison studies.
Supports research on engagement, network evolution, and community formation.
Offers a platform for studying social dynamics without content curation algorithms.
Abstract
Here we present a massive longitudinal dataset of public Telegram content, comprising over 5.9 billion messages dating from 2015 to 2025, collected from 712 thousand channels and groups, enriched with metadata on forwards, reactions, and polls. The dataset spans multiple languages including Russian and Farsi, representing countries where Telegram shows mainstream adoption, as well as Western languages where Telegram is used in specific sub-communities. The dataset has several advantages. First, when restricted by language, it provides a versatile example of an algorithm-free platform, contrary to many other social media platforms that are strongly influenced by opaque content-curation algorithms. Second, it enables comparative studies across different languages, communities, and user bases under identical platform affordances. The dataset thus offers a foundation for studying engagement…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
