Relieving the Over-Aggregating Effect in Graph Transformers
Junshu Sun, Wanxing Chang, Chenxue Yang, Qingming Huang, Shuhui Wang

TL;DR
This paper identifies the over-aggregating problem in graph transformers, which causes information dilution, and proposes Wideformer, a method that divides and guides message aggregation to improve performance.
Contribution
The paper introduces Wideformer, a novel plug-and-play approach that mitigates over-aggregating in graph attention by dividing and guiding message aggregation processes.
Findings
Wideformer effectively reduces over-aggregating.
It improves the discrimination of messages in graph attention.
Results show superior performance over baseline methods.
Abstract
Graph attention has demonstrated superior performance in graph learning tasks. However, learning from global interactions can be challenging due to the large number of nodes. In this paper, we discover a new phenomenon termed over-aggregating. Over-aggregating arises when a large volume of messages is aggregated into a single node with less discrimination, leading to the dilution of the key messages and potential information loss. To address this, we propose Wideformer, a plug-and-play method for graph attention. Wideformer divides the aggregation of all nodes into parallel processes and guides the model to focus on specific subsets of these processes. The division can limit the input volume per aggregation, avoiding message dilution and reducing information loss. The guiding step sorts and weights the aggregation outputs, prioritizing the informative messages. Evaluations show that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
