Graph Convolutions Enrich the Self-Attention in Transformers!

Jeongwhan Choi; Hyowon Wi; Jayoung Kim; Yehjin Shin; Kookjin Lee,; Nathaniel Trask; Noseong Park

arXiv:2312.04234·cs.LG·November 4, 2024·2 cites

Graph Convolutions Enrich the Self-Attention in Transformers!

Jeongwhan Choi, Hyowon Wi, Jayoung Kim, Yehjin Shin, Kookjin Lee,, Nathaniel Trask, Noseong Park

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces a graph-filter-based self-attention mechanism for Transformers, inspired by graph signal processing, which enhances performance across multiple domains despite increased complexity.

Contribution

It reinterprets self-attention as a graph filter and proposes GFSA, a novel graph-based self-attention method that improves Transformer performance in diverse tasks.

Findings

01

GFSA outperforms traditional self-attention in multiple tasks

02

Increased complexity is justified by performance gains

03

Applicable across NLP, CV, speech, and graph tasks

Abstract

Transformers, renowned for their self-attention mechanism, have achieved state-of-the-art performance across various tasks in natural language processing, computer vision, time-series modeling, etc. However, one of the challenges with deep Transformer models is the oversmoothing problem, where representations across layers converge to indistinguishable values, leading to significant performance degradation. We interpret the original self-attention as a simple graph filter and redesign it from a graph signal processing (GSP) perspective. We propose a graph-filter-based self-attention (GFSA) to learn a general yet effective one, whose complexity, however, is slightly larger than that of the original self-attention mechanism. We demonstrate that GFSA improves the performance of Transformers in various fields, including computer vision, natural language processing, graph-level tasks, speech…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

jeongwhanchoi/gfsa
jaxOfficial

Videos

Graph Convolutions Enrich the Self-Attention in Transformers!· slideslive

Taxonomy

TopicsAdvanced Graph Neural Networks · Topic Modeling · Sentiment Analysis and Opinion Mining

MethodsAttention Is All You Need · Linear Layer · Residual Connection · Dropout · Softmax · Multi-Head Attention · Byte Pair Encoding · Adam · Absolute Position Encodings · Layer Normalization