Structural Latency Perturbation in Large Language Models Through Recursive State Induction
Michael Mangrum, Jonathan Pemberton, Benedict Wetherby, Philip, Montague

TL;DR
This paper introduces a recursive state induction method to dynamically reduce inference latency in large language models, maintaining generative quality while improving efficiency and power consumption.
Contribution
It presents a novel structured latency perturbation mechanism based on recursive state induction that reduces inference latency without altering model architecture.
Findings
Latency reduction across various sequence lengths
Maintains token retention and memory utilization
Improves power efficiency during extended text generation
Abstract
Computational efficiency has remained a critical consideration in scaling high-capacity language models, with inference latency and resource consumption presenting significant constraints on real-time applications. The study has introduced a structured latency perturbation mechanism that modifies computational pathways through recursive state induction, enabling dynamic suppression of redundant activations while preserving generative fidelity. A formal mathematical framework has been established to describe recursive perturbations, ensuring that modifications remain adaptive rather than statically imposed. Experiments have demonstrated that applying recursive state adjustments reduces inference latency across varying sequence lengths, with longer text generations benefiting from cumulative efficiency improvements. Comparative evaluations against structured pruning and quantization have…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques
MethodsPruning
