Distributed Multi Class SVM for Large Data Sets

Aruna Govada; Bhavul Gauri; S.K. Sahay

arXiv:1512.00664·cs.DC·December 3, 2015

Distributed Multi Class SVM for Large Data Sets

Aruna Govada, Bhavul Gauri, S.K. Sahay

PDF

TL;DR

This paper introduces a distributed multi-class SVM algorithm that constructs a global model from local models, reducing communication costs and improving scalability for large, geographically distributed datasets.

Contribution

It proposes a novel distributed SVM approach that merges local models into a global model, enhancing accuracy and efficiency over centralized and ensemble methods.

Findings

01

Better accuracy than centralized and ensemble methods

02

Significant reduction in training time due to parallel local SVMs

03

Scalable to large datasets of hundreds of thousands of instances

Abstract

Data mining algorithms are originally designed by assuming the data is available at one centralized site.These algorithms also assume that the whole data is fit into main memory while running the algorithm. But in today's scenario the data has to be handled is distributed even geographically. Bringing the data into a centralized site is a bottleneck in terms of the bandwidth when compared with the size of the data. In this paper for multiclass SVM we propose an algorithm which builds a global SVM model by merging the local SVMs using a distributed approach(DSVM). And the global SVM will be communicated to each site and made it available for further classification. The experimental analysis has shown promising results with better accuracy when compared with both the centralized and ensemble method. The time complexity is also reduced drastically because of the parallel construction of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.