Sub-Setting Algorithm for Training Data Selection in Pattern Recognition

AGaurav Arwade; Sigurdur Olafsson

arXiv:2110.06527·stat.ML·October 14, 2021

Sub-Setting Algorithm for Training Data Selection in Pattern Recognition

AGaurav Arwade, Sigurdur Olafsson

PDF

Open Access

TL;DR

This paper introduces a sub-setting algorithm that selects simple, local data subsets for training, improving accuracy and explainability in pattern recognition tasks compared to traditional global models.

Contribution

The paper presents a novel sub-setting algorithm that identifies multiple simple local patterns, enhancing interpretability and accuracy over traditional global learning algorithms.

Findings

01

15% better accuracy on stroke dataset

02

Identified subsets use previously unused features

03

Each subset represents a distinct data population

Abstract

Modern pattern recognition tasks use complex algorithms that take advantage of large datasets to make more accurate predictions than traditional algorithms such as decision trees or k-nearest-neighbor better suited to describe simple structures. While increased accuracy is often crucial, less complexity also has value. This paper proposes a training data selection algorithm that identifies multiple subsets with simple structures. A learning algorithm trained on such a subset can classify an instance belonging to the subset with better accuracy than the traditional learning algorithms. In other words, while existing pattern recognition algorithms attempt to learn a global mapping function to represent the entire dataset, we argue that an ensemble of simple local patterns may better describe the data. Hence the sub-setting algorithm identifies multiple subsets with simple local patterns…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Data Classification · Imbalanced Data Classification Techniques · Explainable Artificial Intelligence (XAI)