Frost: A Platform for Benchmarking and Exploring Data Matching Results
Martin Graf, Lukas Laskowski, Florian Papsdorf, Florian Sold, Roland, Gremmelspacher, Felix Naumann, Fabian Panse

TL;DR
Frost is an open-source platform that enables comprehensive benchmarking and exploration of data matching solutions, integrating quality, cost, and effort metrics to improve data deduplication processes.
Contribution
It introduces a novel platform that combines benchmarks, metrics, and exploration tools for data matching, addressing gaps in existing evaluation methods.
Findings
Supports systematic exploration of matching results
Integrates multiple quality and cost metrics
Open-source implementation in Snowman
Abstract
"Bad" data has a direct impact on 88% of companies, with the average company losing 12% of its revenue due to it. Duplicates - multiple but different representations of the same real-world entities - are among the main reasons for poor data quality, so finding and configuring the right deduplication solution is essential. Existing data matching benchmarks focus on the quality of matching results and neglect other important factors, such as business requirements. Additionally, they often do not support the exploration of data matching results. To address this gap between the mere counting of record pairs vs. a comprehensive means to evaluate data matching solutions, we present the Frost platform. It combines existing benchmarks, established quality metrics, cost and effort metrics, and exploration techniques, making it the first platform to allow systematic exploration to understand…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
