TWikiL -- The Twitter Wikipedia Link Dataset
Florian Meier

TL;DR
TWikiL is a comprehensive dataset capturing all Wikipedia links shared on Twitter from 2006 to 2021, enabling extensive analysis of Wikipedia's social media presence and its evolution over time.
Contribution
The paper introduces TWikiL, a novel dataset of Twitter-Wikipedia links with enriched metadata, facilitating research on social media and knowledge platform interactions.
Findings
Dataset covers 2006-2021 period.
Links are enriched with Wikidata IDs and categories.
Initial analysis shows evolving Wikipedia link sharing patterns.
Abstract
Recent research has shown how strongly Wikipedia and other web services or platforms are connected. For example, search engines rely heavily on surfacing Wikipedia links to satisfy their users' information needs and volunteer-created Wikipedia content frequently gets re-used on other social media platforms like Reddit. However, publicly accessible datasets that enable researchers to study the interrelationship between Wikipedia and other platforms are sparse. In addition to that, most studies only focus on certain points in time and don't consider the historical perspective. To begin solving these problems we developed TWikiL, the Twitter Wikipedia Link Dataset, which contains all Wikipedia links posted on Twitter in the period 2006 to January 2021. We extract Wikipedia links from Tweets and enrich the referenced articles with their respective Wikidata identifiers and Wikipedia topic…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsWikis in Education and Collaboration · Topic Modeling · Natural Language Processing Techniques
