Hybrid Dual-Path Linear Transformations for Efficient Transformer Architectures

Vladimer Khasia

arXiv:2602.07070·cs.LG·February 10, 2026

Hybrid Dual-Path Linear Transformations for Efficient Transformer Architectures

Vladimer Khasia

PDF

TL;DR

This paper introduces the HDPL operator, a hybrid approach combining local sparse and global low-rank transformations within Transformers, leading to more efficient models with better performance and interpretability.

Contribution

The paper proposes the Hybrid Dual-Path Linear (HDPL) operator that decomposes affine transformations into local and global pathways, enhancing efficiency and representational power in Transformer architectures.

Findings

01

Outperforms standard Llama baseline on FineWeb-Edu dataset

02

Reduces parameter count by 6.8% while improving validation loss

03

Provides a probabilistic latent space for interpretability and control

Abstract

Standard Transformer architectures rely heavily on dense linear transformations, treating feature projection as a monolithic, full-rank operation. We argue that this formulation is inefficient and lacks the structural inductive bias necessary for distinguishing between local feature preservation and global context integration. To address this, we introduce the Hybrid Dual-Path Linear (HDPL) operator, which decomposes the affine transformation into two topologically distinct pathways: a sparse block-diagonal component for high-rank local processing, and a low-rank Variational Autoencoder (VAE) bottleneck for global context regularization. By "surgically" replacing specific projections (Query, Key, Value, Gate, Up) with HDPL operators while retaining standard dense layers for aggregation (Output, Down), we achieve a superior balance of efficiency and representational power. Experiments on…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.