Grouped self-attention mechanism for a memory-efficient Transformer

Bumjun Jung; Yusuke Mukuta; Tatsuya Harada

arXiv:2210.00440·cs.LG·October 7, 2022·1 cites

Grouped self-attention mechanism for a memory-efficient Transformer

Bumjun Jung, Yusuke Mukuta, Tatsuya Harada

PDF

Open Access

TL;DR

This paper introduces a memory-efficient Transformer model with novel Grouped Self-Attention and Compressed Cross-Attention modules, effectively capturing long-range dependencies in time-series data while reducing computational complexity.

Contribution

The paper presents two new modules that enable Transformers to handle long sequences efficiently with linear complexity, improving time-series forecasting performance.

Findings

01

Achieved $O(l)$ complexity with sequence length $l$

02

Model performs comparably or better than existing methods

03

Effectively captures local and global information in time-series data

Abstract

Time-series data analysis is important because numerous real-world tasks such as forecasting weather, electricity consumption, and stock market involve predicting data that vary over time. Time-series data are generally recorded over a long period of observation with long sequences owing to their periodic characteristics and long-range dependencies over time. Thus, capturing long-range dependency is an important factor in time-series data forecasting. To solve these problems, we proposed two novel modules, Grouped Self-Attention (GSA) and Compressed Cross-Attention (CCA). With both modules, we achieved a computational space and time complexity of order $O (l)$ with a sequence length $l$ under small hyperparameter limitations, and can capture locality while considering global information. The results of experiments conducted on time-series datasets show that our proposed model efficiently…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTime Series Analysis and Forecasting · Neural Networks and Applications · Neural Networks and Reservoir Computing