Mixed Signals: Analyzing Software Attribution Challenges in the Android Ecosystem
Kaspar Hageman, \'Alvaro Feal, Julien Gamba, Aniketh Girish, Jakob, Bleier, Martina Lindorfer, Juan Tapiador, Narseo Vallina-Rodriguez

TL;DR
This paper empirically investigates the challenges of author attribution in the Android ecosystem, revealing the unreliability of common signals like metadata and signing certificates due to their volatility and misuse.
Contribution
It provides the first comprehensive empirical analysis of attribution signals in Android markets and introduces the attribution graph to evaluate their validity.
Findings
Market metadata is often missing or volatile over time.
Signing certificates are shared among different authors, reducing their reliability.
The attribution graph reveals the confusion caused by unreliable signals.
Abstract
The ability to identify the author responsible for a given software object is critical for many research studies and for enhancing software transparency and accountability. However, as opposed to other application markets like iOS, attribution in the Android ecosystem is known to be hard. Prior research has leveraged market metadata and signing certificates to identify software authors without questioning the validity and accuracy of these attribution signals. However, Android app authors can, either intentionally or by mistake, hide their true identity due to: (1) the lack of policy enforcement by markets to ensure the accuracy and correctness of the information disclosed by developers in their market profiles during the app release process, and (2) the use of self-signed certificates for signing apps instead of certificates issued by trusted CAs. In this paper, we perform the first…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPrivacy, Security, and Data Protection · Spam and Phishing Detection · Copyright and Intellectual Property
