Amazon Books Rating prediction & Recommendation Model
Hsiu-Ping Lin, Suman Chauhan, Yougender Chauhan, Nagender Chauhan,, Jongwook Woo

TL;DR
This paper presents a machine learning approach using PySpark to predict Amazon book ratings and build a recommendation system, comparing binary and multiclass classification methods for accuracy.
Contribution
It introduces a scalable data pipeline and model tuning techniques for rating prediction and recommendation based on Amazon book data.
Findings
Binary classification yields higher accuracy than multiclass classification.
Hyper-parameter tuning improves model performance.
The recommendation system utilizes multiple book attributes for suggestions.
Abstract
This paper uses the dataset of Amazon to predict the books ratings listed on Amazon website. As part of this project, we predicted the ratings of the books, and also built a recommendation cluster. This recommendation cluster provides the recommended books based on the column's values from dataset, for instance, category, description, author, price, reviews etc. This paper provides a flow of handling big data files, data engineering, building models and providing predictions. The models predict book ratings column using various PySpark Machine Learning APIs. Additionally, we used hyper-parameters and parameters tuning. Also, Cross Validation and TrainValidationSplit were used for generalization. Finally, we performed a comparison between Binary Classification and Multiclass Classification in their accuracies. We converted our label from multiclass to binary to see if we could find any…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTechnology Adoption and User Behaviour
