Exploring the representativeness of the M5 competition data
Evangelos Theodorou, Shengjie Wang, Yanfei Kang, Evangelos Spiliotis,, Spyros Makridakis, Vassilios Assimakopoulos

TL;DR
This paper assesses whether the M5 competition data is representative of real-world retail sales data by comparing its characteristics with data from other grocery retailers, thus evaluating its generalizability.
Contribution
It provides an analysis of the similarities between M5 data and other retail datasets, supporting its use for broader retail forecasting applications.
Findings
Small discrepancies found between M5 and other retail datasets
Supports the representativeness of M5 data for diverse retail contexts
Enhances confidence in M5-based forecasting methods
Abstract
The main objective of the M5 competition, which focused on forecasting the hierarchical unit sales of Walmart, was to evaluate the accuracy and uncertainty of forecasting methods in the field in order to identify best practices and highlight their practical implications. However, whether the findings of the M5 competition can be generalized and exploited by retail firms to better support their decisions and operation depends on the extent to which the M5 data is sufficiently similar to unit sales data of retailers that operate in different regions, sell different types of products, and consider different marketing strategies. To answer this question, we analyze the characteristics of the M5 time series and compare them with those of two grocery retailers, namely Corporaci\'on Favorita and a major Greek supermarket chain, using feature spaces. Our results suggest that there are only…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
