Content-based data leakage detection using extended fingerprinting

Yuri Shapira; Bracha Shapira; Asaf Shabtai

arXiv:1302.2028·cs.CR·February 11, 2013·25 cites

Content-based data leakage detection using extended fingerprinting

Yuri Shapira, Bracha Shapira, Asaf Shabtai

PDF

Open Access

TL;DR

This paper introduces an extended fingerprinting method based on sorted k-skip-n-grams that improves data leakage detection by focusing on core confidential content and resisting content rephrasing.

Contribution

The paper presents a novel fingerprinting approach that isolates confidential content and enhances robustness against rephrasing, addressing limitations of existing methods.

Findings

01

More accurate detection of confidential content

02

Robustness against content rephrasing

03

Ability to detect unseen confidential documents

Abstract

Protecting sensitive information from unauthorized disclosure is a major concern of every organization. As an organizations employees need to access such information in order to carry out their daily work, data leakage detection is both an essential and challenging task. Whether caused by malicious intent or an inadvertent mistake, data loss can result in significant damage to the organization. Fingerprinting is a content-based method used for detecting data leakage. In fingerprinting, signatures of known confidential content are extracted and matched with outgoing content in order to detect leakage of sensitive content. Existing fingerprinting methods, however, suffer from two major limitations. First, fingerprinting can be bypassed by rephrasing (or minor modification) of the confidential content, and second, usually the whole content of document is fingerprinted (including…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Malware Detection Techniques · Internet Traffic Analysis and Secure E-voting · Spam and Phishing Detection