Loklak - A Distributed Crawler and Data Harvester for Overcoming Rate Limits
Sudheesh Singanamalla, Michael Peter Christen

TL;DR
Loklak is an open source distributed crawler designed to collect social media data from platforms like Twitter and Weibo, overcoming rate limits and authentication barriers to support research.
Contribution
It introduces a peer-to-peer distributed crawling system that overcomes social network rate limits and authentication barriers for continuous data collection.
Findings
Enables continuous data collection from social networks.
Overcomes rate limits and authentication barriers.
Provides an open data repository for research.
Abstract
Modern social networks have become sources for vast quantities of data. Having access to such big data can be very useful for various researchers and data scientists. In this paper we describe Loklak, an open source distributed peer to peer crawler and scraper for supporting such research on platforms like Twitter, Weibo and other social networks. Social networks such as Twitter and Weibo pose various limitations to the user on the rate at which one could freely collect such data for research. Our crawler enables researchers to continuously collect data while overcoming the barriers of authentication and rate limits imposed to provide a repository of open data as a service.
Click any figure to enlarge with its caption.
Figure 1Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Malware Detection Techniques · Peer-to-Peer Network Technologies · Web Data Mining and Analysis
See pages 1-last of Loklak-SIGIR.pdf
