NuwaTS: a Foundation Model Mending Every Incomplete Time Series

Jinguo Cheng; Chunwei Yang; Wanlin Cai; Yuxuan Liang; Qingsong Wen,; Yuankai Wu

arXiv:2405.15317·cs.LG·October 3, 2024

NuwaTS: a Foundation Model Mending Every Incomplete Time Series

Jinguo Cheng, Chunwei Yang, Wanlin Cai, Yuxuan Liang, Qingsong Wen,, Yuankai Wu

PDF

Open Access 1 Repo 4 Reviews

TL;DR

NuwaTS is a versatile framework that adapts pre-trained language models for general time series imputation, capable of handling diverse domains and missing patterns with minimal fine-tuning, outperforming domain-specific models.

Contribution

The paper introduces NuwaTS, a novel PLM-based framework for time series imputation that generalizes across domains and variables, with a new benchmarking protocol for cross-domain evaluation.

Findings

01

NuwaTS outperforms state-of-the-art models on diverse datasets.

02

It generalizes well to unseen variables and domains.

03

The framework also benefits other tasks like forecasting.

Abstract

Time series imputation is critical for many real-world applications and has been widely studied. However, existing models often require specialized designs tailored to specific missing patterns, variables, or domains which limits their generalizability. In addition, current evaluation frameworks primarily focus on domain-specific tasks and often rely on time-wise train/validation/test data splits, which fail to rigorously assess a model's ability to generalize across unseen variables or domains. In this paper, we present \textbf{NuwaTS}, a novel framework that repurposes Pre-trained Language Models (PLMs) for general time series imputation. Once trained, NuwaTS can be applied to impute missing data across any domain. We introduce specialized embeddings for each sub-series patch, capturing information about the patch, its missing data patterns, and its statistical characteristics. By…

Peer Reviews

Decision·ICLR 2025 Conference Withdrawn Submission

Reviewer 01Rating 3Confidence 4

Strengths

1. Imputation is an important step in almost every real-world time-series analysis, and designing a foundational model for general purpose time-series imputation is both interesting and useful for the community. 2. The proposed domain-specific embedding is novel, and allows for the lightweight (and plug-and-play) domain-specific fine-tuning of the NuwaTS model.

Weaknesses

1. While the paper details a few time-series modeling approaches using pre-trained LMs in related works, the motivation for using them for imputation is missing. See additional details in questions. 2. The paper defines time-series imputation as a univariate problem. It leverages this formulation to introduce a novel benchmarking paradigm (called variable-wise division). However, multivariate time-series imputation has unique advantages over univariate formulations, such as the ability to levera

Reviewer 02Rating 3Confidence 4

Strengths

1. This paper includes three different settings to use PLMs for time series imputation. Different settings have their own challenges. 2. The paper includes comprehensive experimental results on diverse settings. Ablation studies are also comprehensive.

Weaknesses

1. I couldn't find variances of the results. In my experience, the time series imputation performance often highly depends on the seed settings. 2. Related to 1: Table 4 zero-shot performance gain seems marginal compared to PatchTST. Please provide variances. 3. Table 8 results makes me doubt the model. For forecasting (which is an extreme case of imputation), the model seems worse (or just comparable) to baselines yet this uses much more parameters. I know the authors only trained small MLPs bu

Reviewer 03Rating 5Confidence 3

Strengths

- This paper is well written and organized. I enjoy reading this paper. - I agree with the idea and appreciate the effort to deal with cross-domain generalization which is important in time-series modeling since the different patterns among domains are more significant than other data modalities. - The authors provide well organized code base for the method which helps reproducibility and usability. - Imputation-specific zero-shot model is new, and some designs(e.g. using missing information) c

Weaknesses

- Although this paper introduce several interesting concepts, I think the authors should justify their design choices since the authors integrates several existing components and conducts heavy engineering to boost the zero-shot capability, but there are sufficient techniques for existing works in time-series forecasting literature which can be extended to imputation task. I think the performance compared to other LLM based time series method is not that impressive - I think proposing new bench

Reviewer 04Rating 5Confidence 3

Strengths

Experiments are conducted thoroughly. Novel designs such as statistics patch embedding, missingnes embedding, time series path contrastive learning and p-tuningv2 style domain specific fine-tuning.

Weaknesses

Though several designs are proposed, they are not as effective as the paper claims. In table 6, I don't think the improvements of these designs are significant enough. Modeling each individual dimension separately is a limitation of the proposed method. Althought the authors proposes to include inter-series correlation information during fine-tuning for forecasting, they don't show that inter-series correlation information can be used for imputation. I think this method is not applicable to dat

Code & Models

Repositories

chengyui/nuwats
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTime Series Analysis and Forecasting · Advanced Database Systems and Queries

MethodsFocus · Contrastive Learning