Unlocking Social Media and User Generated Content as a Data Source for Knowledge Management
James Meneghello, Nik Thompson, Kevin Lee, Kok Wai Wong, Bilal, Abu-Salih

TL;DR
This paper presents novel methods for automatically collecting and unlocking user-generated content from social media, enhancing data accessibility for knowledge management and analytical applications.
Contribution
It introduces new algorithms for navigating pagination and site-agnostic data collection, along with a publicly available testbed for future research.
Findings
Increased UGC data accessibility demonstrated by new algorithms.
Effective navigation of pagination systems for data extraction.
Benchmarking shows improved performance over existing techniques.
Abstract
The pervasiveness of Social Media and user-generated content has triggered an exponential increase in global data volumes. However, due to collection and extraction challenges, data in many feeds, embedded comments, reviews and testimonials are inaccessible as a generic data source. This paper incorporates Knowledge Management framework as a paradigm for knowledge management and data value extraction. This framework embodies solutions to unlock the potential of UGC as a rich, real-time data source for analytical applications. The contributions described in this paper are threefold. Firstly, a method for automatically navigating pagination systems to expose UGC for collection is presented. This is evaluated using browser emulation integrated with dynamic data collection. Secondly, a new method for collecting social data without any a priori knowledge of the sites is introduced. Finally,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
