Negative Results of Fusing Code and Documentation for Learning to Accurately Identify Sensitive Source and Sink Methods An Application to the Android Framework for Data Leak Detection
Jordan Samhi, Maria Kober, Abdoul Kader Kabore, Steven Arzt,, Tegawend\'e F. Bissyand\'e, Jacques Klein

TL;DR
This paper presents CoDoC, a deep learning-based tool that combines source code and documentation to accurately identify sensitive source and sink methods in Android, improving precision over previous models but still facing challenges in real-world scenarios.
Contribution
CoDoC introduces a novel approach using deep learning and combined code-documentation analysis to better identify privacy-related API methods in Android apps.
Findings
CoDoC achieves 91% precision, recall, and F1 score in cross-validation.
It outperforms the state-of-the-art SuSi on the same dataset.
Real-world performance of machine learning models for privacy detection remains limited.
Abstract
Apps on mobile phones manipulate all sorts of data, including sensitive data, leading to privacy-related concerns. Recent regulations like the European GDPR provide rules for the processing of personal and sensitive data, like that no such data may be leaked without the consent of the user. Researchers have proposed sophisticated approaches to track sensitive data within mobile apps, all of which rely on specific lists of sensitive source and sink API methods. The data flow analysis results greatly depend on these lists' quality. Previous approaches either used incomplete hand-written lists that quickly became outdated or relied on machine learning. The latter, however, leads to numerous false positives, as we show. This paper introduces CoDoC, a tool that aims to revive the machine-learning approach to precisely identify privacy-related source and sink API methods. In contrast to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Malware Detection Techniques · Privacy, Security, and Data Protection · Internet Traffic Analysis and Secure E-voting
