Sensitive Information Detection: Recursive Neural Networks for Encoding Context
Jan Neerbek

TL;DR
This paper introduces a novel deep learning approach using recursive neural networks for detecting sensitive information in unstructured text, significantly outperforming previous keyword-based methods.
Contribution
The paper develops a context-aware recursive neural network model for sensitive information detection that relies solely on labeled examples, avoiding rule-based limitations.
Findings
Deep neural models outperform keyword-based methods
Context-based detection improves accuracy on real-world data
Approach requires only labeled examples, not rules or seed words
Abstract
The amount of data for processing and categorization grows at an ever increasing rate. At the same time the demand for collaboration and transparency in organizations, government and businesses, drives the release of data from internal repositories to the public or 3rd party domain. This in turn increase the potential of sharing sensitive information. The leak of sensitive information can potentially be very costly, both financially for organizations, but also for individuals. In this work we address the important problem of sensitive information detection. Specially we focus on detection in unstructured text documents. We show that simplistic, brittle rule sets for detecting sensitive information only find a small fraction of the actual sensitive information. Furthermore we show that previous state-of-the-art approaches have been implicitly tailored to such simplistic scenarios and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Spam and Phishing Detection · Text and Document Classification Technologies
