TOGGLE: Temporal Logic-Guided Large Language Model Compression for Edge
Khurram Khalil, Khaza Anuarul Hoque

TL;DR
TOGGLE introduces a formal, logic-guided approach to compress large language models, ensuring linguistic properties are preserved while significantly reducing computational costs for edge deployment.
Contribution
It is the first to integrate formal temporal logic specifications into LLM compression, enabling verifiable and property-preserving model reduction without retraining.
Findings
Achieves up to 3.3x FLOPs reduction.
Reduces model size by up to 68.8%.
Maintains linguistic property satisfaction.
Abstract
Large Language Models (LLMs) deliver exceptional performance across natural language tasks but demand substantial computational resources, limiting their deployment on resource-constrained edge devices. Existing compression techniques, such as quantization and pruning, often degrade critical linguistic properties and lack formal guarantees for preserving model behavior. We propose Temporal Logic-Guided Large Language Model Compression (TOGGLE), a novel framework that leverages Signal Temporal Logic (STL) to formally specify and enforce linguistic properties during compression. TOGGLE employs an STL robustness-guided Bayesian optimization to systematically explore layer-wise quantization and pruning configurations, generating compressed models that formally satisfy specified linguistic constraints without retraining or fine-tuning. Evaluating TOGGLE on four LLM architectures (GPT-2,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Big Data and Digital Economy · Topic Modeling
