Reliable and Efficient Long-Term Social Media Monitoring
Jian Cao, Nicholas Adams-Cohen, R. Michael Alvarez

TL;DR
This paper introduces a cloud-based infrastructure for long-term social media data collection that overcomes LAN-based limitations, ensuring reliable, efficient, and scalable data archiving at minimal costs.
Contribution
It presents a novel cloud computing system for social media data collection, addressing long-term reliability and efficiency issues faced by LAN-based methods.
Findings
System reduces data collection failures over long periods
Cost-effective solution for high-volume social media data archiving
Adaptable to multiple social media platforms
Abstract
Social media data is now widely used by many academic researchers. However, long-term social media data collection projects, which most typically involve collecting data from public-use APIs, often encounter issues when relying on local-area network servers (LANs) to collect high-volume streaming social media data over long periods of time. In this technical report, we present a cloud-based data collection, pre-processing, and archiving infrastructure, and argue that this system mitigates or resolves the problems most typically encountered when running social media data collection projects on LANs at minimal cloud-computing costs. We show how this approach works in different cloud computing architectures, and how to adapt the method to collect streaming data from other social media platforms.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPeer-to-Peer Network Technologies · Web Data Mining and Analysis · Caching and Content Delivery
