Protecting Privacy in Software Logs: What Should Be Anonymized?
Roozbeh Aghili, Heng Li, Foutse Khomh

TL;DR
This paper analyzes privacy concerns in software logs by examining datasets, regulations, literature, and industry practices to identify sensitive information and highlight the need for standardized anonymization guidelines.
Contribution
It provides a comprehensive, multi-perspective analysis of log privacy, bridging gaps between regulations, research, and industry practices to inform better anonymization strategies.
Findings
Identified sensitive attributes in 25 log datasets
Analyzed legal and industry practices for log anonymization
Highlighted challenges and need for standard guidelines
Abstract
Software logs, generated during the runtime of software systems, are essential for various development and analysis activities, such as anomaly detection and failure diagnosis. However, the presence of sensitive information in these logs poses significant privacy concerns, particularly regarding Personally Identifiable Information (PII) and quasi-identifiers that could lead to re-identification risks. While general data privacy has been extensively studied, the specific domain of privacy in software logs remains underexplored, with inconsistent definitions of sensitivity and a lack of standardized guidelines for anonymization. To mitigate this gap, this study offers a comprehensive analysis of privacy in software logs from multiple perspectives. We start by performing an analysis of 25 publicly available log datasets to identify potentially sensitive attributes. Based on the result of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAuction Theory and Applications
