Sub-Setting Algorithm for Training Data Selection in Pattern Recognition
AGaurav Arwade, Sigurdur Olafsson

TL;DR
This paper introduces a sub-setting algorithm that selects simple, local data subsets for training, improving accuracy and explainability in pattern recognition tasks compared to traditional global models.
Contribution
The paper presents a novel sub-setting algorithm that identifies multiple simple local patterns, enhancing interpretability and accuracy over traditional global learning algorithms.
Findings
15% better accuracy on stroke dataset
Identified subsets use previously unused features
Each subset represents a distinct data population
Abstract
Modern pattern recognition tasks use complex algorithms that take advantage of large datasets to make more accurate predictions than traditional algorithms such as decision trees or k-nearest-neighbor better suited to describe simple structures. While increased accuracy is often crucial, less complexity also has value. This paper proposes a training data selection algorithm that identifies multiple subsets with simple structures. A learning algorithm trained on such a subset can classify an instance belonging to the subset with better accuracy than the traditional learning algorithms. In other words, while existing pattern recognition algorithms attempt to learn a global mapping function to represent the entire dataset, we argue that an ensemble of simple local patterns may better describe the data. Hence the sub-setting algorithm identifies multiple subsets with simple local patterns…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Data Classification · Imbalanced Data Classification Techniques · Explainable Artificial Intelligence (XAI)
