Enhancing Real-Time Master Data Management with Complex Match and Merge Algorithms
Durai Rajamanickam

TL;DR
This paper presents a novel real-time master data management algorithm that combines deterministic, fuzzy, and machine learning techniques, leveraging distributed computing for scalable, accurate, and low-latency data consolidation.
Contribution
It introduces a new complex match and merge algorithm optimized for real-time MDM, utilizing distributed processing and machine learning for improved accuracy and scalability.
Findings
90% accuracy on datasets of 10 million records
30% latency improvement over traditional MDM systems
Effective in healthcare and finance domains
Abstract
Master Data Management (MDM) ensures data integrity, consistency, and reliability across an organization's systems. I introduce a novel complex match and merge algorithm optimized for real-time MDM solutions. The proposed method accurately identifies duplicates and consolidates records in large-scale datasets by combining deterministic matching, fuzzy matching, and machine learning-based conflict resolution. I implemented it using PySpark and Databricks; the algorithm benefits from distributed computing and Delta Lake for scalable and reliable data processing. Comprehensive performance evaluations demonstrate a 90% accuracy on datasets of up to 10 million records while maintaining low latency and high throughput, significantly improving upon existing MDM approaches. The method shows strong potential in domains such as healthcare and finance, with an overall 30% improvement in latency…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Database Systems and Queries · Data Quality and Management · Graph Theory and Algorithms
