Doing Data Right: How Lessons Learned Working with Conventional Data   should Inform the Future of Synthetic Data for Recommender Systems

Manel Slokom; Martha Larson

arXiv:2110.03275·cs.IR·October 8, 2021

Doing Data Right: How Lessons Learned Working with Conventional Data should Inform the Future of Synthetic Data for Recommender Systems

Manel Slokom, Martha Larson

PDF

Open Access

TL;DR

This paper emphasizes the importance of careful dataset design and description in synthetic data for recommender systems, advocating for lessons learned from traditional data to avoid bias and enhance future opportunities.

Contribution

It highlights the need to 'do data right' by applying past lessons to synthetic data, promoting better dataset practices and exploring new research directions.

Findings

01

Addressing dataset bias can improve evaluation accuracy.

02

Explicit dataset description aids reproducibility and FAIR principles.

03

Synthetic data can support data minimization efforts.

Abstract

We present a case that the newly emerging field of synthetic data in the area of recommender systems should prioritize `doing data right'. We consider this catchphrase to have two aspects: First, we should not repeat the mistakes of the past, and, second, we should explore the full scope of opportunities presented by synthetic data as we move into the future. We argue that explicit attention to dataset design and description will help to avoid past mistakes with dataset bias and evaluation. In order to fully exploit the opportunities of synthetic data, we point out that researchers can investigate new areas such as using data synthesize to support reproducibility by making data open, as well as FAIR, and to push forward our understanding of data minimization.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsPrivacy-Preserving Technologies in Data · Recommender Systems and Techniques · Data Quality and Management