Efficient 2-Step Protocol and Its Discriminative Feature Selections in Secure Similar Document Detection
Sang-Pil Kim, Myeong-Sun Gil, Yang-Sae Moon, and Hee-Sun Won

TL;DR
This paper introduces an efficient 2-step protocol for secure similar document detection that reduces computational overhead using feature selection and discriminative features, significantly improving performance over previous methods.
Contribution
The paper proposes a novel 2-step protocol with discriminative feature selection methods to enhance efficiency in secure document similarity detection.
Findings
2-4 orders of magnitude performance improvement over 1-step protocol
Effective feature selection methods for reducing computation
Validated the protocol's correctness and efficiency empirically
Abstract
Secure similar document detection (SSDD) identifies similar documents of two parties while each party does not disclose its own sensitive documents to another party. In this paper, we propose an efficient 2-step protocol that exploits a feature selection as the lower-dimensional transformation and presents discriminative feature selections to maximize the performance of the protocol. For this, we first analyze that the existing 1-step protocol causes serious computation and communication overhead for high dimensional document vectors. To alleviate the overhead, we next present the feature selection-based 2-step protocol and formally prove its correctness. The proposed 2-step protocol works as follows: (1) in the filtering step, it uses low dimensional vectors obtained by the feature selection to filter out non-similar documents; (2) in the post-processing step, it identifies similar…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Steganography and Watermarking Techniques · Handwritten Text Recognition Techniques · Internet Traffic Analysis and Secure E-voting
