SimCleaner -- Sistema de Padroniza\c{c}\~ao de Bases de Dados utilizando Fun\c{c}\~oes de Similaridade
Carlos Diego Nascimento Damasceno, Fabio Manoel Fran\c{c}a Lobato,, Elton Rocha Moutinho, Arilene Santos de Fran\c{c}a, Ivan Ikikame de Oliveira,, \'Adamo Lima de Santana

TL;DR
This paper introduces SimCleaner, a tool that uses similarity functions to standardize and clean databases, improving data consistency for better pattern detection in data mining tasks.
Contribution
The paper presents a novel data cleaning tool based on similarity functions that effectively standardizes databases, demonstrated on a public security system database.
Findings
Efficient standardization of a public security database
Reusable tool applicable to various databases
Improved data consistency for data mining
Abstract
The Knowledge Discovery in Database (KDD) process permits the detection of pattern in databases, where this analysis may be compromised if database is not consistent, making necessary the use of data cleaning techniques. This paper presents a tool based in similarity functions to help the preprocessing of databases and it behaved efficiently in the standardization of a System of Public Security of the State of Par\'a database and may be reused with other databases and other data mining projects.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Mining Algorithms and Applications · Data Quality and Management
