Privacy Preserving MFI Based Similarity Measure For Hierarchical Document Clustering
P. Rajesh, G. Narasimha, N. Saisumanth

TL;DR
This paper introduces a privacy-preserving hierarchical document clustering method using maximal frequent item sets (MFI) to improve search relevance and protect document copyrights on the web.
Contribution
It proposes a novel MFI-based similarity measure for hierarchical clustering that reduces dimensionality and incorporates privacy preservation through equivalence relations.
Findings
Effective clustering based on MFI reduces dimensionality.
Privacy preservation prevents duplicate and unauthorized document sharing.
Improved relevance in web document search results.
Abstract
The increasing nature of World Wide Web has imposed great challenges for researchers in improving the search efficiency over the internet. Now days web document clustering has become an important research topic to provide most relevant documents in huge volumes of results returned in response to a simple query. In this paper, first we proposed a novel approach, to precisely define clusters based on maximal frequent item set (MFI) by Apriori algorithm. Afterwards utilizing the same maximal frequent item set (MFI) based similarity measure for Hierarchical document clustering. By considering maximal frequent item sets, the dimensionality of document set is decreased. Secondly, providing privacy preserving of open web documents is to avoiding duplicate documents. There by we can protect the privacy of individual copy rights of documents. This can be achieved using equivalence relation.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsText and Document Classification Technologies · Face and Expression Recognition · Data Mining Algorithms and Applications
