Temporal Convolutional Attention-based Network For Sequence Modeling
Hongyan Hao, Yan Wang, Siqiao Xue, Yudi Xia, Jian Zhao, Furao Shen

TL;DR
This paper introduces the Temporal Convolutional Attention-based Network (TCAN), a novel architecture combining convolutional and attention mechanisms to improve sequence modeling, outperforming existing models on language modeling benchmarks.
Contribution
The paper proposes TCAN, integrating temporal convolution and attention with residual connections, offering an effective alternative to recurrent networks for sequence tasks.
Findings
Achieved state-of-the-art perplexity on word-level PTB
Improved results on character-level PTB
Enhanced performance on WikiText-2
Abstract
With the development of feed-forward models, the default model for sequence modeling has gradually evolved to replace recurrent networks. Many powerful feed-forward models based on convolutional networks and attention mechanism were proposed and show more potential to handle sequence modeling tasks. We wonder that is there an architecture that can not only achieve an approximate substitution of recurrent network, but also absorb the advantages of feed-forward models. So we propose an exploratory architecture referred to Temporal Convolutional Attention-based Network (TCAN) which combines temporal convolutional network and attention mechanism. TCAN includes two parts, one is Temporal Attention (TA) which captures relevant features inside the sequence, the other is Enhanced Residual (ER) which extracts shallow layer's important information and transfers to deep layers. We improve the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Handwritten Text Recognition Techniques
