Unveiling Unicode's Unseen Underpinnings in Undermining Authorship Attribution
Robert Dilworth

TL;DR
This paper explores how Unicode steganography can undermine authorship attribution by analyzing stylometric techniques and proposing counter-strategies to protect author anonymity.
Contribution
It introduces Unicode steganography as a novel method to counteract stylometric analysis and enhances adversarial stylometry techniques.
Findings
Unicode steganography can effectively obscure stylometric features.
Counter-strategies can improve authorship anonymization.
Stylometric analysis remains vulnerable despite anonymization efforts.
Abstract
When using a public communication channel--whether formal or informal, such as commenting or posting on social media--end users have no expectation of privacy: they compose a message and broadcast it for the world to see. Even if an end user takes utmost precautions to anonymize their online presence--using an alias or pseudonym; masking their IP address; spoofing their geolocation; concealing their operating system and user agent; deploying encryption; registering with a disposable phone number or email; disabling non-essential settings; revoking permissions; and blocking cookies and fingerprinting--one obvious element still lingers: the message itself. Assuming they avoid lapses in judgment or accidental self-exposure, there should be little evidence to validate their actual identity, right? Wrong. The content of their message--necessarily open for public consumption--exposes an…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
