Frequency Sensitive Duplicate Detection Using Multi-Metric Spaces
Debjyoti Chatterjee, Shashi Bajaj Mukherjee

TL;DR
This paper introduces multi-metric spaces on multisets to incorporate frequency information into similarity measures, improving duplicate detection accuracy in data-intensive applications.
Contribution
It presents a novel multi-metric space framework that models frequency-sensitive similarities, addressing limitations of classical metric spaces.
Findings
Multi-metric spaces effectively incorporate frequency information.
Frequency-sensitive duplicate detection outperforms classical methods.
The approach enhances accuracy in data-intensive systems.
Abstract
Classical metric spaces often fail to model data-intensive systems where repetition and frequency of values are meaningful. In applications such as transactional databases, sensor logs, and record linkage, conventional distance measures ignore multiplicity information, leading to information loss and incorrect similarity judgments. This paper introduces multi-metric spaces defined on multisets and valued in the multi-real number system, providing a principled way to incorporate frequency into distance computations. We demonstrate the usefulness of multi-metrics through a frequency sensitive duplicate detection example, showing improved accuracy over classical metric based approaches.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Management and Algorithms · Data Quality and Management · Time Series Analysis and Forecasting
