Privacy Preserving MFI Based Similarity Measure For Hierarchical   Document Clustering

P. Rajesh; G. Narasimha; N. Saisumanth

arXiv:1207.2900·cs.DB·July 13, 2012

Privacy Preserving MFI Based Similarity Measure For Hierarchical Document Clustering

P. Rajesh, G. Narasimha, N. Saisumanth

PDF

Open Access

TL;DR

This paper introduces a privacy-preserving hierarchical document clustering method using maximal frequent item sets (MFI) to improve search relevance and protect document copyrights on the web.

Contribution

It proposes a novel MFI-based similarity measure for hierarchical clustering that reduces dimensionality and incorporates privacy preservation through equivalence relations.

Findings

01

Effective clustering based on MFI reduces dimensionality.

02

Privacy preservation prevents duplicate and unauthorized document sharing.

03

Improved relevance in web document search results.

Abstract

The increasing nature of World Wide Web has imposed great challenges for researchers in improving the search efficiency over the internet. Now days web document clustering has become an important research topic to provide most relevant documents in huge volumes of results returned in response to a simple query. In this paper, first we proposed a novel approach, to precisely define clusters based on maximal frequent item set (MFI) by Apriori algorithm. Afterwards utilizing the same maximal frequent item set (MFI) based similarity measure for Hierarchical document clustering. By considering maximal frequent item sets, the dimensionality of document set is decreased. Secondly, providing privacy preserving of open web documents is to avoiding duplicate documents. There by we can protect the privacy of individual copy rights of documents. This can be achieved using equivalence relation.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsText and Document Classification Technologies · Face and Expression Recognition · Data Mining Algorithms and Applications