Rough Transformers for Continuous and Efficient Time-Series Modelling

Fernando Moreno-Pino; \'Alvaro Arroyo; Harrison Waldon; Xiaowen Dong,; \'Alvaro Cartea

arXiv:2403.10288·stat.ML·March 18, 2024·3 cites

Rough Transformers for Continuous and Efficient Time-Series Modelling

Fernando Moreno-Pino, \'Alvaro Arroyo, Harrison Waldon, Xiaowen Dong,, \'Alvaro Cartea

PDF

Open Access

TL;DR

The paper introduces the Rough Transformer, a continuous-time model for time-series data that reduces computational costs while effectively capturing long-range dependencies, especially useful in medical applications.

Contribution

It proposes the Rough Transformer with multi-view signature attention, combining Neural ODEs and Transformers for efficient, long-range time-series modeling.

Findings

01

Outperforms vanilla attention models in accuracy.

02

Uses significantly less computational time and memory.

03

Effective on both synthetic and real-world data.

Abstract

Time-series data in real-world medical settings typically exhibit long-range dependencies and are observed at non-uniform intervals. In such contexts, traditional sequence-based recurrent models struggle. To overcome this, researchers replace recurrent architectures with Neural ODE-based models to model irregularly sampled data and use Transformer-based architectures to account for long-range dependencies. Despite the success of these two approaches, both incur very high computational costs for input sequences of moderate lengths and greater. To mitigate this, we introduce the Rough Transformer, a variation of the Transformer model which operates on continuous-time representations of input sequences and incurs significantly reduced computational costs, critical for addressing long-range dependencies common in medical contexts. In particular, we propose multi-view signature attention,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Applications

MethodsAttention Is All You Need · Absolute Position Encodings · Residual Connection · Softmax · Position-Wise Feed-Forward Layer · Layer Normalization · Dropout · Linear Layer · Multi-Head Attention · Byte Pair Encoding