Efficient Sampling for Better OSN Data Provisioning
Nick Duffield, Balachander Krishnamurthy

TL;DR
This paper introduces a method for OSNs to provide tunable, weighted samples of user graphs in non-overlapping increments, improving data provision efficiency and accuracy for graph analysis.
Contribution
It presents a novel sampling technique that allows OSNs to release scalable, weighted graph samples with controlled size and minimal overlap, enhancing data utility.
Findings
Enables OSNs to provide scalable graph samples
Improves accuracy of graph feature estimation
Reduces resource consumption in data sampling
Abstract
Data concerning the users and usage of Online Social Networks (OSNs) has become available externally, from public resources (e.g., user profiles), participation in OSNs (e.g., establishing relationships and recording transactions such as user updates) and APIs of the OSN provider (such as the Twitter API). APIs let OSN providers monetize the release of data while helping control measurement load, e.g. by providing samples with different cost-granularity tradeoffs. To date, this approach has been more suited to releasing transactional data, with graphical data still being obtained by resource intensive methods such a graph crawling. In this paper, we propose a method for OSNs to provide samples of the user graph of tunable size, in non-intersecting increments, with sample selection that can be weighted to enhance accuracy when estimating different features of the graph.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
