PaPaformer: Language Model from Pre-trained Parallel Paths

Joonas Tapaninaho; Mourad Oussala

arXiv:2508.00544·cs.CL·August 11, 2025

PaPaformer: Language Model from Pre-trained Parallel Paths

Joonas Tapaninaho, Mourad Oussala

PDF

Open Access

TL;DR

PaPaformer introduces a novel decoder-only transformer architecture with parallel paths, enabling faster training, customization for specific tasks, and reduced computational costs compared to traditional models.

Contribution

The paper presents PaPaformer, a transformer variant with parallel paths that can be trained separately and combined, reducing training time and allowing task-specific customization.

Findings

01

Training time reduced from days to hours.

02

Lower-dimensional paths can be trained independently.

03

Model performance improves with combined paths.

Abstract

The training of modern large-language models requires an increasingly amount of computation power and time. Even smaller variants, such as small-language models (SLMs), take several days to train in the best-case scenarios, often requiring multiple GPUs. This paper explores methods to train and evaluate decoder-only transformer-based language models in hours instead of days/weeks. We introduces \textit{PaPaformer}, a decoder-only transformer architecture variant, whose lower-dimensional parallel paths are combined into larger model. The paper shows that these lower-dimensional paths can be trained individually with different types of training data and then combined into one larger model. This method gives the option to reduce the total number of model parameters and the training time with increasing performance. Moreover, the use of parallel path structure opens interesting…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling