SPPAM - Statistical PreProcessing AlgorithM
Tiago Silva, In\^es Dutra

TL;DR
SPPAM is a preprocessing algorithm that aggregates correlated records into single instances, improving classification performance on relational data by transforming it into a suitable format for traditional machine learning tools.
Contribution
The paper introduces SPPAM, a novel data preprocessing method that enhances classification accuracy by aggregating related records prior to learning.
Findings
SPPAM improves classifier accuracy on correlated datasets.
Aggregating data with SPPAM outperforms using all individual records.
The method is effective for various types of relational data.
Abstract
Most machine learning tools work with a single table where each row is an instance and each column is an attribute. Each cell of the table contains an attribute value for an instance. This representation prevents one important form of learning, which is, classification based on groups of correlated records, such as multiple exams of a single patient, internet customer preferences, weather forecast or prediction of sea conditions for a given day. To some extent, relational learning methods, such as inductive logic programming, can capture this correlation through the use of intensional predicates added to the background knowledge. In this work, we propose SPPAM, an algorithm that aggregates past observations in one single record. We show that applying SPPAM to the original correlated data, before the learning task, can produce classifiers that are better than the ones trained using all…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSemantic Web and Ontologies · Data Mining Algorithms and Applications · Bayesian Modeling and Causal Inference
