Knowledge & Learning-based Adaptable System for Sensitive Information Identification and Handling
Akshar Kaul, Manish Kesarwani, Hong Min, Qi Zhang

TL;DR
KLASSIFI is an adaptable system that identifies and redacts sensitive information in diagnostic data, ensuring data privacy while maintaining utility, with proven scalability and performance in large file processing.
Contribution
The paper introduces KLASSIFI, a customizable, end-to-end system for sensitive information handling in diagnostic data, with optimized performance and scalability.
Findings
Processes 128 GB files in 84 minutes
Performance scales linearly with data size
Maintains metadata for debugging tools
Abstract
Diagnostic data such as logs and memory dumps from production systems are often shared with development teams to do root cause analysis of system crashes. Invariably such diagnostic data contains sensitive information and sharing it can lead to data leaks. To handle this problem we present Knowledge and Learning-based Adaptable System for Sensitive InFormation Identification and Handling (KLASSIFI) which is an end to end system capable of identifying and redacting sensitive information present in diagnostic data. KLASSIFI is highly customizable, allowing it to be used for various different business use cases by simply changing the configuration. KLASSIFI ensures that the output file is useful by retaining the metadata which is used by various debugging tools. Various optimizations have been done to improve the performance of KLASSIFI. Empirical evaluation of KLASSIFI shows that it is…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDigital and Cyber Forensics · Anomaly Detection Techniques and Applications · Security and Verification in Computing
