Natural Is The Best: Model-Agnostic Code Simplification for Pre-trained Large Language Models
Yan Wang, Xiaoning Li, Tien Nguyen, Shaohua Wang, Chao Ni, Ling Ding

TL;DR
This paper introduces SlimCode, a model-agnostic code simplification method that reduces computational complexity and improves efficiency for large language models in code tasks, without relying on model-specific attention patterns.
Contribution
We propose SlimCode, a novel, model-agnostic code simplification approach based on input token nature, enhancing efficiency and performance across multiple LLMs and tasks.
Findings
SlimCode improves code search and summarization metrics by over 5%.
It reduces GPT-4 API costs by up to 24%.
SlimCode is 133 times faster than previous methods.
Abstract
Pre-trained Large Language Models (LLM) have achieved remarkable successes in several domains. However, code-oriented LLMs are heavy in computational complexity, and quadratically with the length of the input. Toward simplifying the input program of an LLM, the state-of-the-art approach has the strategies to filter the input code tokens based on the attention scores given by the LLM. The decision to simplify the input should not rely on the attention patterns of an LLM, as these patterns are influenced by both the model architecture and the pre-training dataset. Since the model and dataset are part of the solution domain, not the problem domain where the input belongs, the outcome may differ when the model is pre-trained on a different dataset. We propose SlimCode, a model-agnostic code simplification solution for LLMs that depends on the nature of input code tokens. As an empirical…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsText Readability and Simplification · Topic Modeling · Natural Language Processing Techniques
