Can pre-trained Transformers be used in detecting complex sensitive   sentences? -- A Monsanto case study

Roelien C. Timmer; David Liebowitz; Surya Nepal; Salil S.; Kanhere

arXiv:2203.06793·cs.CL·March 15, 2022

Can pre-trained Transformers be used in detecting complex sensitive sentences? -- A Monsanto case study

Roelien C. Timmer, David Liebowitz, Surya Nepal, Salil S., Kanhere

PDF

Open Access

TL;DR

This study evaluates the effectiveness of pre-trained transformer models, specifically BERT, in detecting complex sensitive sentences within organizational documents, demonstrating significant performance improvements over traditional methods.

Contribution

The paper demonstrates that fine-tuned BERT models outperform traditional keyword and machine learning approaches in detecting complex sensitive information in diverse document categories.

Findings

01

BERT achieves up to 65.79% higher F2 scores in sensitive sentence detection.

02

Transformer models outperform traditional models across all tested document categories.

03

Significant performance gains suggest transformers are well-suited for sensitive information detection.

Abstract

Each and every organisation releases information in a variety of forms ranging from annual reports to legal proceedings. Such documents may contain sensitive information and releasing them openly may lead to the leakage of confidential information. Detection of sentences that contain sensitive information in documents can help organisations prevent the leakage of valuable confidential information. This is especially challenging when such sentences contain a substantial amount of information or are paraphrased versions of known sensitive content. Current approaches to sensitive information detection in such complex settings are based on keyword-based approaches or standard machine learning models. In this paper, we wish to explore whether pre-trained transformer models are well suited to detect complex sensitive information. Pre-trained transformers are typically trained on an enormous…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDigital and Cyber Forensics · Cybercrime and Law Enforcement Studies · Misinformation and Its Impacts

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Multi-Head Attention · Attention Is All You Need · Linear Layer · Dense Connections · Residual Connection · Weight Decay · Layer Normalization · Linear Warmup With Linear Decay · WordPiece