TL;DR
Joker is a scalable framework that optimizes various kernel models efficiently, significantly reducing memory usage while maintaining or improving performance in large-scale nonlinear learning tasks.
Contribution
It introduces a joint optimization framework for multiple kernel models with a novel dual block coordinate descent method and kernel approximation, addressing scalability and memory issues.
Findings
Reduces memory usage by up to 90%
Maintains or improves training time and accuracy
Applicable to diverse kernel models
Abstract
Kernel methods are powerful tools for nonlinear learning with well-established theory. The scalability issue has been their long-standing challenge. Despite the existing success, there are two limitations in large-scale kernel methods: (i) The memory overhead is too high for users to afford; (ii) existing efforts mainly focus on kernel ridge regression (KRR), while other models lack study. In this paper, we propose Joker, a joint optimization framework for diverse kernel models, including KRR, logistic regression, and support vector machines. We design a dual block coordinate descent method with trust region (DBCD-TR) and adopt kernel approximation with randomized features, leading to low memory costs and high efficiency in large-scale learning. Experiments show that Joker saves up to 90\% memory but achieves comparable training time and performance (or even better) than the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
