Converting Transformers into DGNNs Form

Jie Zhang; Mao-Hsuan Mao; Bo-Wei Chiu; Min-Te Sun

arXiv:2502.00585·cs.LG·March 5, 2025

Converting Transformers into DGNNs Form

Jie Zhang, Mao-Hsuan Mao, Bo-Wei Chiu, Min-Te Sun

PDF

Open Access 1 Repo

TL;DR

This paper introduces Converter, a novel method that transforms Transformers into DGNNs using digraph convolution, achieving superior performance and efficiency on various benchmarks.

Contribution

It formalizes a synthetic digraph convolution to convert Transformers into DGNNs, offering a lightweight and effective alternative to self-attention.

Findings

01

Converter outperforms traditional Transformers on benchmarks.

02

It maintains computational efficiency and simplicity.

03

Effective across diverse tasks like document and DNA classification.

Abstract

Recent advances in deep learning have established Transformer architectures as the predominant modeling paradigm. Central to the success of Transformers is the self-attention mechanism, which scores the similarity between query and key matrices to modulate a value matrix. This operation bears striking similarities to digraph convolution, prompting an investigation into whether digraph convolution could serve as an alternative to self-attention. In this study, we formalize this concept by introducing a synthetic unitary digraph convolution based on the digraph Fourier transform. The resulting model, which we term Converter, effectively converts a Transformer into a Directed Graph Neural Network (DGNN) form. We have tested Converter on Long-Range Arena benchmark, long document classification, and DNA sequence-based taxonomy classification. Our experimental results demonstrate that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

hazdzz/Converter
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Graph Neural Networks · Graph Theory and Algorithms · Big Data and Digital Economy

MethodsAttention Is All You Need · Linear Layer · Multi-Head Attention · Position-Wise Feed-Forward Layer · Adam · Softmax · Absolute Position Encodings · Dropout · Label Smoothing · Graph Neural Network