Rough Transformers for Continuous and Efficient Time-Series Modelling
Fernando Moreno-Pino, \'Alvaro Arroyo, Harrison Waldon, Xiaowen Dong,, \'Alvaro Cartea

TL;DR
The paper introduces the Rough Transformer, a continuous-time model for time-series data that reduces computational costs while effectively capturing long-range dependencies, especially useful in medical applications.
Contribution
It proposes the Rough Transformer with multi-view signature attention, combining Neural ODEs and Transformers for efficient, long-range time-series modeling.
Findings
Outperforms vanilla attention models in accuracy.
Uses significantly less computational time and memory.
Effective on both synthetic and real-world data.
Abstract
Time-series data in real-world medical settings typically exhibit long-range dependencies and are observed at non-uniform intervals. In such contexts, traditional sequence-based recurrent models struggle. To overcome this, researchers replace recurrent architectures with Neural ODE-based models to model irregularly sampled data and use Transformer-based architectures to account for long-range dependencies. Despite the success of these two approaches, both incur very high computational costs for input sequences of moderate lengths and greater. To mitigate this, we introduce the Rough Transformer, a variation of the Transformer model which operates on continuous-time representations of input sequences and incurs significantly reduced computational costs, critical for addressing long-range dependencies common in medical contexts. In particular, we propose multi-view signature attention,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications
MethodsAttention Is All You Need · Absolute Position Encodings · Residual Connection · Softmax · Position-Wise Feed-Forward Layer · Layer Normalization · Dropout · Linear Layer · Multi-Head Attention · Byte Pair Encoding
