Chromatic Learning for Sparse Datasets
Vladimir Feinberg, Peter Bailis

TL;DR
Chromatic Learning (CL) is a scalable method that compresses high-dimensional sparse data into low-dimensional dense representations using graph coloring, enabling efficient learning with minimal information loss.
Contribution
The paper introduces Chromatic Learning, a novel graph coloring-based approach for compressing sparse datasets, improving scalability and performance over traditional methods.
Findings
CL compresses datasets from over 50M features to 1024 features.
CL maintains test accuracy comparable to traditional methods.
CL enables effective deep learning on wide, sparse datasets.
Abstract
Learning over sparse, high-dimensional data frequently necessitates the use of specialized methods such as the hashing trick. In this work, we design a highly scalable alternative approach that leverages the low degree of feature co-occurrences present in many practical settings. This approach, which we call Chromatic Learning (CL), obtains a low-dimensional dense feature representation by performing graph coloring over the co-occurrence graph of features---an approach previously used as a runtime performance optimization for GBDT training. This color-based dense representation can be combined with additional dense categorical encoding approaches, e.g., submodular feature compression, to further reduce dimensionality. CL exhibits linear parallelizability and consumes memory linear in the size of the co-occurrence graph. By leveraging the structural properties of the co-occurrence graph,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image and Video Retrieval Techniques · Recommender Systems and Techniques · Multimodal Machine Learning Applications
