Algorithm Performance Spaces for Strategic Dataset Selection

Steffen Schulz

arXiv:2505.01442·cs.IR·May 6, 2025

Algorithm Performance Spaces for Strategic Dataset Selection

Steffen Schulz

PDF

Open Access 1 Repo

TL;DR

This paper introduces the Algorithm Performance Space, a framework for differentiating datasets based on algorithm performance to improve dataset selection in recommender system research.

Contribution

It proposes a novel framework and three metrics to quantify dataset differences, aiding in more appropriate dataset selection for algorithm evaluation.

Findings

01

Created an Algorithm Performance Space to differentiate datasets

02

Validated the use of three metrics for dataset comparison

03

Demonstrated the framework's potential for diverse dataset selection

Abstract

The evaluation of new algorithms in recommender systems frequently depends on publicly available datasets, such as those from MovieLens or Amazon. Some of these datasets are being disproportionately utilized primarily due to their historical popularity as baselines rather than their suitability for specific research contexts. This thesis addresses this issue by introducing the Algorithm Performance Space, a novel framework designed to differentiate datasets based on the measured performance of algorithms applied to them. An experimental study proposes three metrics to quantify and justify dataset selection to evaluate new algorithms. These metrics also validate assumptions about datasets, such as the similarity between MovieLens datasets of varying sizes. By creating an Algorithm Performance Space and using the proposed metrics, differentiating datasets was made possible, and diverse…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

dlay/aps
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsBig Data and Business Intelligence