A Blue Start: A large-scale pairwise and higher-order social network dataset
Alyssa Smith, Ilya Amburg, Sagar Kumar, Brooke Foucault Welles, and Nicholas W. Landry

TL;DR
This paper introduces 'A Blue Start', a comprehensive large-scale social network dataset from Bluesky, capturing both pairwise and higher-order group interactions, facilitating advanced research in social network analysis.
Contribution
The paper provides the first large-scale dataset combining pairwise and higher-order social interactions, bridging a critical gap for validating models of social group formation and spreading processes.
Findings
Dataset includes 39.7 million users and 2.4 billion relationships.
Contains 365,800 groups representing higher-order social ties.
Enables new research in social network dynamics and higher-order interactions.
Abstract
Large-scale networks have been instrumental in shaping how we think about social systems, and have undergirded many foundational results in mathematical epidemiology, computational social science, and biology. However, many of the social systems through which diseases spread, information disseminates, and individuals interact are inherently mediated through groups, known as higher-order interactions. A gap exists between higher-order models of group formation and spreading processes and the data necessary to validate these mechanisms. Similarly, few datasets bridge the gap between pairwise and higher-order network data. The Bluesky social media platform is an ideal laboratory for observing social ties at scale through its open API. Not only does Bluesky contain pairwise following relationships, but it also contains higher-order social ties known as "starter packs" which are user-curated…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComplex Network Analysis Techniques · COVID-19 epidemiological studies · Advanced Graph Neural Networks
