SDLog: A Deep Learning Framework for Detecting Sensitive Information in Software Logs
Roozbeh Aghili, Xingfang Wu, Foutse Khomh, Heng Li

TL;DR
SDLog is a deep learning framework that effectively detects sensitive information in software logs, surpassing traditional regex-based methods and requiring minimal fine-tuning data.
Contribution
This paper introduces SDLog, the first deep learning approach for log anonymization that outperforms regex-based techniques with limited training samples.
Findings
SDLog correctly identifies 99.5% of sensitive attributes with minimal data.
Achieves an F1-score of 98.4% in sensitive information detection.
Outperforms existing regex-based methods in accuracy and generalizability.
Abstract
Software logs are messages recorded during the execution of a software system that provide crucial run-time information about events and activities. Although software logs have a critical role in software maintenance and operation tasks, publicly accessible log datasets remain limited, hindering advance in log analysis research and practices. The presence of sensitive information, particularly Personally Identifiable Information (PII) and quasi-identifiers, introduces serious privacy and re-identification risks, discouraging the publishing and sharing of real-world logs. In practice, log anonymization techniques primarily rely on regular expression patterns, which involve manually crafting rules to identify and replace sensitive information. However, these regex-based approaches suffer from significant limitations, such as extensive manual efforts and poor generalizability across…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Software System Performance and Reliability · Anomaly Detection Techniques and Applications
