SimCleaner -- Sistema de Padroniza\c{c}\~ao de Bases de Dados utilizando   Fun\c{c}\~oes de Similaridade

Carlos Diego Nascimento Damasceno; Fabio Manoel Fran\c{c}a Lobato,; Elton Rocha Moutinho; Arilene Santos de Fran\c{c}a; Ivan Ikikame de Oliveira,; \'Adamo Lima de Santana

arXiv:2107.12884·cs.DB·August 12, 2021

SimCleaner -- Sistema de Padroniza\c{c}\~ao de Bases de Dados utilizando Fun\c{c}\~oes de Similaridade

Carlos Diego Nascimento Damasceno, Fabio Manoel Fran\c{c}a Lobato,, Elton Rocha Moutinho, Arilene Santos de Fran\c{c}a, Ivan Ikikame de Oliveira,, \'Adamo Lima de Santana

PDF

Open Access

TL;DR

This paper introduces SimCleaner, a tool that uses similarity functions to standardize and clean databases, improving data consistency for better pattern detection in data mining tasks.

Contribution

The paper presents a novel data cleaning tool based on similarity functions that effectively standardizes databases, demonstrated on a public security system database.

Findings

01

Efficient standardization of a public security database

02

Reusable tool applicable to various databases

03

Improved data consistency for data mining

Abstract

The Knowledge Discovery in Database (KDD) process permits the detection of pattern in databases, where this analysis may be compromised if database is not consistent, making necessary the use of data cleaning techniques. This paper presents a tool based in similarity functions to help the preprocessing of databases and it behaved efficiently in the standardization of a System of Public Security of the State of Par\'a database and may be reused with other databases and other data mining projects.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsData Mining Algorithms and Applications · Data Quality and Management