Fusion Matters: Length-Aware Analysis of Positional-Encoding Fusion in Transformers
Mohamed Amine Hallam, Kuo-Kun Tseng

TL;DR
This paper investigates how different fusion strategies of positional encodings affect Transformer performance, especially on long sequences, revealing that fusion choice significantly impacts results in long-document tasks.
Contribution
The study provides a controlled empirical comparison of fusion methods in Transformers, demonstrating the importance of fusion design for long-sequence modeling and introducing a lightweight convolutional gating mechanism.
Findings
Fusion choice has negligible impact on short texts.
Fusion strategies significantly improve long document performance.
Learnable fusion generalizes across positional encoding types.
Abstract
Transformers require positional encodings to represent sequence order, yet most prior work focuses on designing new positional encodings rather than examining how positional information is fused with token embeddings. In this paper, we study whether the fusion mechanism itself affects performance, particularly in long-sequence settings. We conduct a controlled empirical study comparing three canonical fusion strategies--element-wise addition, concatenation with projection, and scalar gated fusion--under identical Transformer architectures, data splits, and random seeds. Experiments on three text classification datasets spanning short (AG News), medium (IMDB), and long (ArXiv) sequences show that fusion choice has negligible impact on short texts but produces consistent gains on long documents. To verify that these gains are structural rather than stochastic, we perform paired-seed…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHandwritten Text Recognition Techniques · Topic Modeling · Natural Language Processing Techniques
