DataGrinder: Fast, Accurate, Fully non-Parametric Classification Approach Using 2D Convex Hulls
Mohammad Khabbaz

TL;DR
DataGrinder introduces a fast, deterministic, non-parametric classification method using 2D convex hulls that achieves high accuracy and scalability, suitable for large datasets and parallel processing.
Contribution
It presents a novel O(n) convex hull-based classification algorithm that is fully deterministic and scalable, differing from probabilistic and sampling-based methods.
Findings
Competitive accuracy compared to existing classifiers
Linear expected running time for convex hull computation
Effective scalability for large datasets
Abstract
It has been a long time, since data mining technologies have made their ways to the field of data management. Classification is one of the most important data mining tasks for label prediction, categorization of objects into groups, advertisement and data management. In this paper, we focus on the standard classification problem which is predicting unknown labels in Euclidean space. Most efforts in Machine Learning communities are devoted to methods that use probabilistic algorithms which are heavy on Calculus and Linear Algebra. Most of these techniques have scalability issues for big data, and are hardly parallelizable if they are to maintain their high accuracies in their standard form. Sampling is a new direction for improving scalability, using many small parallel classifiers. In this paper, rather than conventional sampling methods, we focus on a discrete classification algorithm…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Data Classification · Data Management and Algorithms · Machine Learning and Algorithms
