Comparison of Classification Methods for Very High-Dimensional Data in Sparse Random Projection Representation
Anton Akusok, Emil Eirola

TL;DR
This paper compares various classification methods for high-dimensional sparse data using sparse random projections and introduces an efficient Jaccard kernel as an alternative, showing non-iterative methods often outperform iterative ones.
Contribution
It evaluates non-iterative and iterative classification methods on large sparse datasets and introduces an efficient Jaccard kernel for improved performance.
Findings
Non-iterative methods find larger, more accurate models.
The Jaccard kernel is an effective alternative to sparse random projections.
Non-iterative methods outperform iterative ones in various scenarios.
Abstract
The big data trend has inspired feature-driven learning tasks, which cannot be handled by conventional machine learning models. Unstructured data produces very large binary matrices with millions of columns when converted to vector form. However, such data is often sparse, and hence can be manageable through the use of sparse random projections. This work studies efficient non-iterative and iterative methods suitable for such data, evaluating the results on two representative machine learning tasks with millions of samples and features. An efficient Jaccard kernel is introduced as an alternative to the sparse random projection. Findings indicate that non-iterative methods can find larger, more accurate models than iterative methods in different application scenarios.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFace and Expression Recognition · Domain Adaptation and Few-Shot Learning · Advanced Image and Video Retrieval Techniques
