Fighting Authorship Linkability with Crowdsourcing

Mishari Almishari; Ekin Oguz; Gene Tsudik

arXiv:1405.4918·cs.DL·May 21, 2014·5 cites

Fighting Authorship Linkability with Crowdsourcing

Mishari Almishari, Ekin Oguz, Gene Tsudik

PDF

Open Access

TL;DR

This paper investigates methods to reduce authorship linkability in online reviews by using crowdsourcing and machine translation, significantly enhancing privacy protections against stylometric analysis.

Contribution

It introduces novel approaches combining crowdsourcing and machine translation to effectively diminish stylometric linkability in reviews, improving privacy.

Findings

01

Crowdsourcing re-writing produces reviews with diverse stylometric features.

02

Machine translation decreases linkability as the number of intermediate languages increases.

03

Combining crowdsourcing and translation further reduces authorship linkability.

Abstract

Massive amounts of contributed content -- including traditional literature, blogs, music, videos, reviews and tweets -- are available on the Internet today, with authors numbering in many millions. Textual information, such as product or service reviews, is an important and increasingly popular type of content that is being used as a foundation of many trendy community-based reviewing sites, such as TripAdvisor and Yelp. Some recent results have shown that, due partly to their specialized/topical nature, sets of reviews authored by the same person are readily linkable based on simple stylometric features. In practice, this means that individuals who author more than a few reviews under different accounts (whether within one site or across multiple sites) can be linked, which represents a significant loss of privacy. In this paper, we start by showing that the problem is actually worse…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAuthorship Attribution and Profiling · Topic Modeling · Spam and Phishing Detection