Structural Latency Perturbation in Large Language Models Through   Recursive State Induction

Michael Mangrum; Jonathan Pemberton; Benedict Wetherby; Philip; Montague

arXiv:2502.00758·cs.CL·March 26, 2025

Structural Latency Perturbation in Large Language Models Through Recursive State Induction

Michael Mangrum, Jonathan Pemberton, Benedict Wetherby, Philip, Montague

PDF

Open Access

TL;DR

This paper introduces a recursive state induction method to dynamically reduce inference latency in large language models, maintaining generative quality while improving efficiency and power consumption.

Contribution

It presents a novel structured latency perturbation mechanism based on recursive state induction that reduces inference latency without altering model architecture.

Findings

01

Latency reduction across various sequence lengths

02

Maintains token retention and memory utilization

03

Improves power efficiency during extended text generation

Abstract

Computational efficiency has remained a critical consideration in scaling high-capacity language models, with inference latency and resource consumption presenting significant constraints on real-time applications. The study has introduced a structured latency perturbation mechanism that modifies computational pathways through recursive state induction, enabling dynamic suppression of redundant activations while preserving generative fidelity. A formal mathematical framework has been established to describe recursive perturbations, ensuring that modifications remain adaptive rather than statically imposed. Experiments have demonstrated that applying recursive state adjustments reduces inference latency across varying sequence lengths, with longer text generations benefiting from cumulative efficiency improvements. Comparative evaluations against structured pruning and quantization have…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques

MethodsPruning