CoFrGeNet: Continued Fraction Architectures for Language Generation

Amit Dhurandhar; Vijil Chenthamarakshan; Dennis Wei; Tejaswini Pedapati; Karthikeyan Natesan Ramamurthy; Rahul Nair

arXiv:2601.21766·cs.CL·May 5, 2026

CoFrGeNet: Continued Fraction Architectures for Language Generation

Amit Dhurandhar, Vijil Chenthamarakshan, Dennis Wei, Tejaswini Pedapati, Karthikeyan Natesan Ramamurthy, Rahul Nair

PDF

2 Models

TL;DR

This paper introduces CoFrGeNets, a new architecture inspired by continued fractions, replacing traditional transformer components with fewer parameters and achieving competitive performance on language tasks.

Contribution

The authors propose a novel function class and architectural components for language models that reduce parameters and training time while maintaining or improving performance.

Findings

01

Models with CoFrGeNets are competitive on language tasks.

02

Achieve similar or better results with fewer parameters.

03

Require less pre-training time than original transformers.

Abstract

Transformers are arguably the preferred architecture for language generation. In this paper, inspired by continued fractions, we introduce a new function class for generative modeling. The architecture family implementing this function class is named CoFrGeNets - Continued Fraction Generative Networks. We design novel architectural components based on this function class that can replace Multi-head Attention and Feed-Forward Networks in Transformer blocks while requiring much fewer parameters. We derive custom gradient formulations to optimize the proposed components more accurately and efficiently than using standard PyTorch-based gradients. Our components are a plug-in replacement requiring little change in training or inference procedures that have already been put in place for Transformer-based models thus making our approach easy to incorporate in large industrial workflows. We…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.