You are your Metadata: Identification and Obfuscation of Social Media   Users using Metadata Information

Beatrice Perez; Mirco Musolesi; Gianluca Stringhini

arXiv:1803.10133·cs.CR·May 15, 2018·5 cites

You are your Metadata: Identification and Obfuscation of Social Media Users using Metadata Information

Beatrice Perez, Mirco Musolesi, Gianluca Stringhini

PDF

Open Access

TL;DR

This paper demonstrates that social media metadata can uniquely identify users with high accuracy using machine learning, and that obfuscation strategies are largely ineffective against such identification.

Contribution

It quantifies the identifiability of users from metadata and evaluates the robustness of obfuscation methods, highlighting challenges in protecting user privacy.

Findings

01

96.7% accuracy in user identification with 10,000 users

02

99.22% accuracy when considering top 10 candidates

03

Obfuscation by perturbing 60% of data remains largely ineffective

Abstract

Metadata are associated to most of the information we produce in our daily interactions and communication in the digital world. Yet, surprisingly, metadata are often still catergorized as non-sensitive. Indeed, in the past, researchers and practitioners have mainly focused on the problem of the identification of a user from the content of a message. In this paper, we use Twitter as a case study to quantify the uniqueness of the association between metadata and user identity and to understand the effectiveness of potential obfuscation strategies. More specifically, we analyze atomic fields in the metadata and systematically combine them in an effort to classify new tweets as belonging to an account using different machine learning algorithms of increasing complexity. We demonstrate that through the application of a supervised learning algorithm, we are able to identify any user in a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMisinformation and Its Impacts · Spam and Phishing Detection · Authorship Attribution and Profiling