Privacy-Preserving Redaction of Diagnosis Data through Source Code Analysis
Lixi Zhou, Lei Yu, Jia Zou, Hong Min

TL;DR
This paper introduces a source code analysis method for privacy-preserving log redaction in diagnostic data, significantly improving detection precision and reducing errors compared to existing tools.
Contribution
It proposes a novel source code analysis approach that accurately identifies sensitive information in logs using data flow graphs and logger code augmentation.
Findings
Improved detection precision over baseline tools
Reduced false positives and negatives in log redaction
Effective preservation of diagnostic information
Abstract
Protecting sensitive information in diagnostic data such as logs, is a critical concern in the industrial software diagnosis and debugging process. While there are many tools developed to automatically redact the logs for identifying and removing sensitive information, they have severe limitations which can cause either over redaction and loss of critical diagnostic information (false positives), or disclosure of sensitive information (false negatives), or both. To address the problem, in this paper, we argue for a source code analysis approach for log redaction. To identify a log message containing sensitive information, our method locates the corresponding log statement in the source code with logger code augmentation, and checks if the log statement outputs data from sensitive sources by using the data flow graph built from the source code. Appropriate redaction rules are further…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
