SPPAM - Statistical PreProcessing AlgorithM

Tiago Silva; In\^es Dutra

arXiv:1103.2342·cs.AI·March 14, 2011

SPPAM - Statistical PreProcessing AlgorithM

Tiago Silva, In\^es Dutra

PDF

Open Access

TL;DR

SPPAM is a preprocessing algorithm that aggregates correlated records into single instances, improving classification performance on relational data by transforming it into a suitable format for traditional machine learning tools.

Contribution

The paper introduces SPPAM, a novel data preprocessing method that enhances classification accuracy by aggregating related records prior to learning.

Findings

01

SPPAM improves classifier accuracy on correlated datasets.

02

Aggregating data with SPPAM outperforms using all individual records.

03

The method is effective for various types of relational data.

Abstract

Most machine learning tools work with a single table where each row is an instance and each column is an attribute. Each cell of the table contains an attribute value for an instance. This representation prevents one important form of learning, which is, classification based on groups of correlated records, such as multiple exams of a single patient, internet customer preferences, weather forecast or prediction of sea conditions for a given day. To some extent, relational learning methods, such as inductive logic programming, can capture this correlation through the use of intensional predicates added to the background knowledge. In this work, we propose SPPAM, an algorithm that aggregates past observations in one single record. We show that applying SPPAM to the original correlated data, before the learning task, can produce classifiers that are better than the ones trained using all…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSemantic Web and Ontologies · Data Mining Algorithms and Applications · Bayesian Modeling and Causal Inference