JoFormer (Journey-based Transformer): Theory and Empirical Analysis on the Tiny Shakespeare Dataset

Mahesh Godavarti

arXiv:2506.08652·cs.LG·June 11, 2025

JoFormer (Journey-based Transformer): Theory and Empirical Analysis on the Tiny Shakespeare Dataset

Mahesh Godavarti

PDF

Open Access 1 Repo

TL;DR

JoFormer introduces a journey-based Transformer architecture that leverages non-commutative algebra for positional encoding, outperforming standard models like RoFormer on the Tiny Shakespeare dataset with lower perplexity and faster convergence.

Contribution

The paper presents a novel journey-based Transformer architecture, JoFormer, grounded in non-commutative algebra, extending relative position representations and subsuming existing methods like rotary transformations.

Findings

01

JoFormer achieves lower perplexity than RoFormer on Tiny Shakespeare.

02

JoFormer demonstrates faster convergence in language modeling tasks.

03

The approach offers a more expressive, principled way to incorporate positional information.

Abstract

Transformers have demonstrated remarkable success in sequence modeling, yet effectively incorporating positional information remains a challenging and active area of research. In this paper, we introduce JoFormer, a journey-based Transformer architecture grounded in a recently proposed non-commutative algebra for composing transformations across positions. JoFormer represents relative positions through learnable directional transforms that are sequentially composed along the input, thereby extending and generalizing existing approaches based on relative position representations. We derive the JoFormer attention mechanism from first principles and show that it subsumes standard methods such as rotary transformations as special cases. To evaluate its effectiveness, we compare JoFormer to the RoFormer baseline on the Tiny Shakespeare character-level language modeling task. Our results…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

mahesh-godavarti/joformer
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Human Motion and Animation · 3D Shape Modeling and Analysis

MethodsAbsolute Position Encodings · Layer Normalization · Byte Pair Encoding · Label Smoothing · Softmax · Dropout · Dense Connections · Transformer · Attention Is All You Need