TL;DR
DROP introduces a novel dimensionality reduction optimizer that leverages downstream analytics to efficiently terminate stochastic PCA, achieving significant speedups over traditional methods in time series processing.
Contribution
The paper presents DROP, a new DR optimizer that improves stochastic PCA efficiency by considering downstream tasks, enabling faster end-to-end time series analysis.
Findings
Up to 5x speedup over SVD-based PCA
Exceeds FFT and PAA by up to 16x in workloads
Efficient termination of stochastic PCA based on downstream analytics
Abstract
Dimensionality reduction is a critical step in scaling machine learning pipelines. Principal component analysis (PCA) is a standard tool for dimensionality reduction, but performing PCA over a full dataset can be prohibitively expensive. As a result, theoretical work has studied the effectiveness of iterative, stochastic PCA methods that operate over data samples. However, termination conditions for stochastic PCA either execute for a predetermined number of iterations, or until convergence of the solution, frequently sampling too many or too few datapoints for end-to-end runtime improvements. We show how accounting for downstream analytics operations during DR via PCA allows stochastic methods to efficiently terminate after operating over small (e.g., 1%) subsamples of input data, reducing whole workload runtime. Leveraging this, we propose DROP, a DR optimizer that enables speedups of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
