Matrix Factorization on GPUs with Memory Optimization and Approximate Computing
Wei Tan, Shiyu Chang, Liana Fong, Cheng Li, Zijun Wang, Liangliang Cao

TL;DR
This paper presents a GPU-accelerated matrix factorization method using memory optimization and approximate computing, significantly improving performance for large-scale data over existing CPU and GPU solutions.
Contribution
It introduces a novel GPU-based approach that combines memory hierarchy exploitation and approximate computing to enhance matrix factorization efficiency.
Findings
Outperforms CPU solutions by a large margin
Achieves 2x-4x speedup over state-of-the-art GPU methods
Effectively handles large-scale datasets
Abstract
Matrix factorization (MF) discovers latent features from observations, which has shown great promises in the fields of collaborative filtering, data compression, feature extraction, word embedding, etc. While many problem-specific optimization techniques have been proposed, alternating least square (ALS) remains popular due to its general applicability e.g. easy to handle positive-unlabeled inputs, fast convergence and parallelization capability. Current MF implementations are either optimized for a single machine or with a need of a large computer cluster but still are insufficient. This is because a single machine provides limited compute power for large-scale data while multiple machines suffer from the network communication bottleneck. To address the aforementioned challenge, accelerating ALS on graphics processing units (GPUs) is a promising direction. We propose the novel…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Data Compression Techniques · Advanced Image and Video Retrieval Techniques · Recommender Systems and Techniques
