You Only Need Your Transformer 25% of the Time: Meaning-First Execution for Eliminating Unnecessary Inference
Ryan Shamim

TL;DR
This paper introduces Meaning-First Execution (MFEE), a control-plane approach that reduces transformer inference by 78.1% without sacrificing correctness, by selectively invoking inference based on semantic analysis.
Contribution
MFEE is a novel control-plane architecture that enables selective transformer inference without modifying models, achieving significant reduction while maintaining correctness.
Findings
MFEE reduces transformer inference by 78.1% on diverse prompts.
Pattern-based routers achieve at most 53.3% avoidance with correctness failures.
MFEE achieves 100% avoidance with zero correctness failures.
Abstract
Modern AI inference systems treat transformer execution as mandatory, conflating model capability with execution necessity. We reframe inference as a control-plane decision problem: determining when execution is necessary versus when correctness can be preserved through alternative pathways. We introduce Meaning-First Execution (MFEE), a control-plane architecture implementing this framework, selectively invoking transformer inference only when required. MFEE operates as a gating layer above existing stacks without modifying models, weights, or parameters. Across 1,000 diverse prompts under deterministic decoding, MFEE achieves 78.1% execution reduction while maintaining 100% exact-match equivalence for invoked executions. Comparative evaluation reveals pattern-based routers achieve at most 53.3% avoidance with correctness failures, while MFEE reaches 100% avoidance with zero failures…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware-Defined Networks and 5G · Network Packet Processing and Optimization · Advanced Neural Network Applications
