Automated Big Text Security Classification
Khudran Alzhrani, Ethan M. Rudd, Terrance E. Boult, and C. Edward Chow

TL;DR
This paper introduces ACESS, a novel model for classifying sensitive information in large texts at paragraph level, using a new dataset of leaked diplomatic cables to improve insider threat detection.
Contribution
The paper presents ACESS, a new detection model for big text security classification, and constructs the first dataset of sensitive paragraphs from WikiLeaks cables for analysis.
Findings
ACESS effectively classifies sensitive paragraphs in large texts.
The dataset enables detailed analysis of sensitive information at paragraph granularity.
ACESS outperforms existing document-based detection methods.
Abstract
In recent years, traditional cybersecurity safeguards have proven ineffective against insider threats. Famous cases of sensitive information leaks caused by insiders, including the WikiLeaks release of diplomatic cables and the Edward Snowden incident, have greatly harmed the U.S. government's relationship with other governments and with its own citizens. Data Leak Prevention (DLP) is a solution for detecting and preventing information leaks from within an organization's network. However, state-of-art DLP detection models are only able to detect very limited types of sensitive information, and research in the field has been hindered due to the lack of available sensitive texts. Many researchers have focused on document-based detection with artificially labeled "confidential documents" for which security labels are assigned to the entire document, when in reality only a portion of the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
