Cognitive Load-Aware Inference: A Neuro-Symbolic Framework for Optimizing the Token Economy of Large Language Models
Yilun Zhang

TL;DR
This paper introduces a neuro-symbolic framework called Cognitive Load-Aware Inference (CLAI) that applies cognitive theories to optimize large language model inference, significantly reducing token usage while maintaining performance.
Contribution
It formalizes cognitive load metrics for LLMs and proposes two methods, CLAI-Prompt and CLAI-Tune, to improve inference efficiency based on cognitive principles.
Findings
Up to 45% reduction in token consumption
Maintains accuracy across complex reasoning tasks
Emergent problem decomposition ability in CLAI-Tune
Abstract
The escalating computational costs of Large Language Model (LLM) inference have become a critical barrier to their widespread and sustainable deployment. While existing optimization strategies are effective, they are predominantly based on statistical heuristics or architectural modifications, lacking a guiding cognitive theory to manage the inference process itself. This paper aims to bridge this gap by introducing a novel paradigm: the Cognitive Load-Aware Inference (CLAI) framework, which operationalizes principles from Cognitive Load Theory (CLT) and neuroscience for LLM inference. We formalize the concepts of Intrinsic Cognitive Load, Extraneous Cognitive Load, and Germane Cognitive Load into quantifiable LLM metrics (, , and ), thereby reframing the inference process as a cognitive economics optimization problem: based on the intrinsic complexity…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Artificial Intelligence in Healthcare and Education · Multimodal Machine Learning Applications
