Synthetic dataset generation methodology for Recommender Systems using statistical sampling methods, a Multinomial Logit model, and a Fuzzy Inference System
Vitor T. Camacho

TL;DR
This paper presents a comprehensive methodology for generating synthetic datasets for recommender systems, combining statistical sampling, a Multinomial Logit model, and fuzzy inference to simulate realistic user-item interactions.
Contribution
It introduces a novel approach that integrates multiple statistical and fuzzy logic techniques to create customizable synthetic datasets for recommender system research.
Findings
Enables generation of datasets with diverse feature types
Simulates realistic user behavior and ratings
Facilitates research without real data constraints
Abstract
It is said that we live in the age of data, and that data is ubiquitous and readily available if one has the tools to harness it. That may well be true, but so is the opposite. It is ever more common to try to start a data science project only to find oneself without quality data. Be it due to just not having collected the needed features, or due to insufficient data, or even legality issues, the list goes on. When this happens, either the project is prematurely abandoned, or similar datasets are searched for and used. However, finding a dataset that answers your needs in terms of features, type of ratings, etc., may not be an easy task, this is particularly the case for recommender systems. In this work, a methodology for the generation of synthetic datasets for recommender systems is presented, thus allowing to overcome the obstacle of not having quality data in sufficient amount…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSmart Systems and Machine Learning · Customer churn and segmentation
