
TL;DR
This paper introduces a distributed map-reduce approach for logistic regression that efficiently handles large-scale data by distributing both samples and parameters across nodes, enabling scalable training in Hadoop environments.
Contribution
It proposes a novel Distributed Parameter Map-Reduce method that distributes parameters alongside data, facilitating scalable logistic regression training in distributed systems.
Findings
Linear acceleration with increasing cluster nodes
Effective logistic regression training on large datasets
Demonstrated in Hadoop production environment
Abstract
This paper describes how to convert a machine learning problem into a series of map-reduce tasks. We study logistic regression algorithm. In logistic regression algorithm, it is assumed that samples are independent and each sample is assigned a probability. Parameters are obtained by maxmizing the product of all sample probabilities. Rapid expansion of training samples brings challenges to machine learning method. Training samples are so many that they can be only stored in distributed file system and driven by map-reduce style programs. The main step of logistic regression is inference. According to map-reduce spirit, each sample makes inference through a separate map procedure. But the premise of inference is that the map procedure holds parameters for all features in the sample. In this paper, we propose Distributed Parameter Map-Reduce, in which not only samples, but also parameters…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGraph Theory and Algorithms · Advanced Image and Video Retrieval Techniques · Advanced Graph Neural Networks
MethodsLogistic Regression
