Faster Learning by Reduction of Data Access Time
Vinod Kumar Chauhan, Anuj Sharma, Kalpana Dahiya

TL;DR
This paper addresses the big data challenge in machine learning by proposing systematic and cyclic sampling methods to reduce data access time, resulting in significantly faster training without sacrificing convergence.
Contribution
It introduces systematic and cyclic sampling techniques for mini-batch selection, demonstrating their effectiveness in speeding up training in empirical risk minimization.
Findings
Up to six times faster training times observed.
Theoretical convergence proven for proposed sampling methods.
Effective on benchmark datasets with strong convexity and smoothness assumptions.
Abstract
Nowadays, the major challenge in machine learning is the Big Data challenge. The big data problems due to large number of data points or large number of features in each data point, or both, the training of models have become very slow. The training time has two major components: Time to access the data and time to process (learn from) the data. So far, the research has focused only on the second part, i.e., learning from the data. In this paper, we have proposed one possible solution to handle the big data problems in machine learning. The idea is to reduce the training time through reducing data access time by proposing systematic sampling and cyclic/sequential sampling to select mini-batches from the dataset. To prove the effectiveness of proposed sampling techniques, we have used Empirical Risk Minimization, which is commonly used machine learning problem, for strongly convex and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsSAGA
