# SVM via Saddle Point Optimization: New Bounds and Distributed Algorithms

**Authors:** Yifei Jin, Lingxiao Huang, Jian Li

arXiv: 1705.07252 · 2018-01-30

## TL;DR

This paper introduces new saddle point optimization algorithms for SVM variants, achieving faster approximate solutions with nearly linear time complexity and efficient distributed implementation, outperforming previous methods especially in high-dimensional settings.

## Contribution

The paper presents the first nearly linear time algorithm for $
u$-SVM and improved algorithms for hard-margin SVM using saddle point optimization, with theoretical guarantees and distributed efficiency.

## Key findings

- Achieves $(1-	heta)$-approximation with $	ilde{O}(nd + nrac{	ext{d}}{	heta})$ time.
- First nearly linear time algorithm for $
u$-SVM.
- Distributed algorithms require $	ilde{O}(k(d + rac{	ext{d}}{	heta}))$ communication, nearly matching lower bounds.

## Abstract

We study two important SVM variants: hard-margin SVM (for linearly separable cases) and $\nu$-SVM (for linearly non-separable cases). We propose new algorithms from the perspective of saddle point optimization. Our algorithms achieve $(1-\epsilon)$-approximations with running time $\tilde{O}(nd+n\sqrt{d / \epsilon})$ for both variants, where $n$ is the number of points and $d$ is the dimensionality. To the best of our knowledge, the current best algorithm for $\nu$-SVM is based on quadratic programming approach which requires $\Omega(n^2 d)$ time in worst case~\cite{joachims1998making,platt199912}. In the paper, we provide the first nearly linear time algorithm for $\nu$-SVM. The current best algorithm for hard margin SVM achieved by Gilbert algorithm~\cite{gartner2009coresets} requires $O(nd / \epsilon )$ time. Our algorithm improves the running time by a factor of $\sqrt{d}/\sqrt{\epsilon}$. Moreover, our algorithms can be implemented in the distributed settings naturally. We prove that our algorithms require $\tilde{O}(k(d +\sqrt{d/\epsilon}))$ communication cost, where $k$ is the number of clients, which almost matches the theoretical lower bound. Numerical experiments support our theory and show that our algorithms converge faster on high dimensional, large and dense data sets, as compared to previous methods.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1705.07252/full.md

## Figures

39 figures with captions in the complete paper: https://tomesphere.com/paper/1705.07252/full.md

## References

46 references — full list in the complete paper: https://tomesphere.com/paper/1705.07252/full.md

---
Source: https://tomesphere.com/paper/1705.07252