TSI-Bench: Benchmarking Time Series Imputation

Wenjie Du; Jun Wang; Linglong Qian; Yiyuan Yang; Zina Ibrahim; Fanxing; Liu; Zepu Wang; Haoxin Liu; Zhiyuan Zhao; Yingjie Zhou; Wenjia Wang; Kaize; Ding; Yuxuan Liang; B. Aditya Prakash; Qingsong Wen

arXiv:2406.12747·cs.LG·November 1, 2024·6 cites

TSI-Bench: Benchmarking Time Series Imputation

Wenjie Du, Jun Wang, Linglong Qian, Yiyuan Yang, Zina Ibrahim, Fanxing, Liu, Zepu Wang, Haoxin Liu, Zhiyuan Zhao, Yingjie Zhou, Wenjia Wang, Kaize, Ding, Yuxuan Liang, B. Aditya Prakash, Qingsong Wen

PDF

Open Access 4 Repos 4 Reviews

TL;DR

TSI-Bench introduces a comprehensive benchmark suite for evaluating deep learning-based time series imputation methods, standardizing experimental settings and exploring transferability from forecasting models to imputation tasks.

Contribution

It is the first to systematically benchmark deep learning algorithms for time series imputation with standardized evaluation and a paradigm for adapting forecasting models.

Findings

01

Demonstrates effectiveness across diverse datasets and missingness scenarios.

02

Provides insights into the influence of missing rates and patterns on model performance.

03

Establishes a foundation for future research in time series imputation.

Abstract

Effective imputation is a crucial preprocessing step for time series analysis. Despite the development of numerous deep learning algorithms for time series imputation, the community lacks standardized and comprehensive benchmark platforms to effectively evaluate imputation performance across different settings. Moreover, although many deep learning forecasting algorithms have demonstrated excellent performance, whether their modelling achievements can be transferred to time series imputation tasks remains unexplored. To bridge these gaps, we develop TSI-Bench, the first (to our knowledge) comprehensive benchmark suite for time series imputation utilizing deep learning techniques. The TSI-Bench pipeline standardizes experimental settings to enable fair evaluation of imputation algorithms and identification of meaningful insights into the influence of domain-appropriate missing rates and…

Peer Reviews

Decision·ICLR 2025 Conference Withdrawn Submission

Reviewer 01Rating 3Confidence 4

Strengths

* The article is well-written, well-presented, and provides a comprehensive overview of the state of the art. * The idea of a benchmarking platform for time series imputation is excellent (and indeed desirable for other tasks as well like forecasting); it would be a substantial help to the community and a fair way to evaluate both new and existing algorithms. * In the description provides by the article, the platform seems to contain the necessary features for such a benchmark. * The presentatio

Weaknesses

The weaknesses mainly concern the platform's capabilities and its ease of use : * It is really unfortunate that there was no plan to include static information (such as spatial coordinates for sensor networks, or other types of descriptive attributes). Although many state-of-the-art algorithms for imputation are generic and do not use static information, in many use cases such information is available, and it would be beneficial to allow its use. *Looking at the code, the platform does not appea

Reviewer 02Rating 3Confidence 4

Strengths

- This paper provides a unified and systematic framework on evaluating time series imputation models. - Full results tables in appendix make experiments more credible.

Weaknesses

1. Models, especially generative models (VAE, GAN, Diffusion, or others) are outdated. Authors should include more recent methods such as mTANs[1], TimeCIB[2], GRIN [3], DSPD-GP[4], TIDER[5], BiTGraph[6], and many others. (This is not a research paper but benchmark, I believe the paper should provide much more comprehensive comparison) 2. Especially for deep learning methods, authors don't provide hyperparameters and method-specific settings. For example, which GP kernels did you use? For diffu

Reviewer 03Rating 3Confidence 4

Strengths

- The paper addresses a crucial need for standardized benchmarking in TSI research, enabling more rigorous and comparable evaluations of different methods. - TSI-Bench includes a wide range of deep learning models, datasets, and downstream tasks.

Weaknesses

- While a wide range of time-series imputation methods are considered in this study, more recent multiple imputation methods (e.g., using generative models or probabilistic models) are missing (e.g., [A] – [C]). Including these would provide a more complete picture of the current state of time series imputation. - The adaptation of forecasting methods for imputation seems somewhat unnatural. It appears to involve simply replacing the forecasting output layer with a masking mechanism (similar to

Reviewer 04Rating 5Confidence 4

Strengths

1. This paper thoroughly summarizes existing benchmark datasets and imputation methods. 2. Experiments are thoroughly conducted to study missingness, different methods, different missing patterns and downstream tasks.

Weaknesses

1. This paper is better suited for a journal as a review paper. Nothing novel is provided. All the methods and datasets are from existing works. 2. the findings provided in this paper are also not surprising, i.e. 1) Different missing patterns and rates significantly influence the performance of imputation methods 2) Forecasting architectures demonstrate effectiveness when used as an imputation backbone 3) Imputation enhances both flexibility and effectiveness across downstream tasks.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTime Series Analysis and Forecasting