Curating Social Media Data

Kushal Vaghani

arXiv:2002.09202·cs.SI·February 24, 2020·1 cites

Curating Social Media Data

Kushal Vaghani

PDF

Open Access

TL;DR

This paper introduces CrowdCorrect, a comprehensive data curation pipeline for social media data, combining automated and crowd-sourced methods to improve data quality for analytics.

Contribution

It presents a novel pipeline that automates feature extraction and integrates crowd-sourcing for data correction, enhancing social media data analysis accuracy.

Findings

01

Effective automatic feature extraction from social media data.

02

Successful integration of crowd-sourced corrections.

03

Improved data quality for social media analytics.

Abstract

Social media platforms have empowered the democratization of the pulse of people in the modern era. Due to its immense popularity and high usage, data published on social media sites (e.g., Twitter, Facebook and Tumblr) is a rich ocean of information. Therefore data-driven analytics of social imprints has become a vital asset for organisations and governments to further improve their products and services. However, due to the dynamic and noisy nature of social media data, performing accurate analysis on raw data is a challenging task. A key requirement is to curate the raw data before fed into analytics pipelines. This curation process transforms the raw data into contextualized data and knowledge. We propose a data curation pipeline, namely CrowdCorrect, to enable analysts cleansing and curating social data and preparing it for reliable analytics. Our pipeline provides an automatic…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMobile Crowdsensing and Crowdsourcing · Web Data Mining and Analysis · Data Quality and Management