Distances for Comparing Multisets and Sequences
George Bolt, Sim\'on Lunag\'omez, Christopher Nemeth

TL;DR
This paper explores and compares various distance measures for multisets and sequences in metric spaces, motivated by sports data analysis, providing theoretical insights and practical applications.
Contribution
It introduces a comprehensive analysis of distances for multisets and sequences, including theoretical properties and extensions, with practical illustration on football data.
Findings
Different distances have distinct theoretical properties.
Some distances are more suitable for specific data structures.
Practical examples demonstrate the usefulness of these distances.
Abstract
Measuring the distance between data points is fundamental to many statistical techniques, such as dimension reduction or clustering algorithms. However, improvements in data collection technologies has led to a growing versatility of structured data for which standard distance measures are inapplicable. In this paper, we consider the problem of measuring the distance between sequences and multisets of points lying within a metric space, motivated by the analysis of an in-play football data set. Drawing on the wider literature, including that of time series analysis and optimal transport, we discuss various distances which are available in such an instance. For each distance, we state and prove theoretical properties, proposing possible extensions where they fail. Finally, via an example analysis of the in-play football data, we illustrate the usefulness of these distances in practice.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Management and Algorithms · Time Series Analysis and Forecasting · Fuzzy Systems and Optimization
