Natural Is The Best: Model-Agnostic Code Simplification for Pre-trained   Large Language Models

Yan Wang; Xiaoning Li; Tien Nguyen; Shaohua Wang; Chao Ni; Ling Ding

arXiv:2405.11196·cs.SE·May 21, 2024

Natural Is The Best: Model-Agnostic Code Simplification for Pre-trained Large Language Models

Yan Wang, Xiaoning Li, Tien Nguyen, Shaohua Wang, Chao Ni, Ling Ding

PDF

Open Access 1 Repo

TL;DR

This paper introduces SlimCode, a model-agnostic code simplification method that reduces computational complexity and improves efficiency for large language models in code tasks, without relying on model-specific attention patterns.

Contribution

We propose SlimCode, a novel, model-agnostic code simplification approach based on input token nature, enhancing efficiency and performance across multiple LLMs and tasks.

Findings

01

SlimCode improves code search and summarization metrics by over 5%.

02

It reduces GPT-4 API costs by up to 24%.

03

SlimCode is 133 times faster than previous methods.

Abstract

Pre-trained Large Language Models (LLM) have achieved remarkable successes in several domains. However, code-oriented LLMs are heavy in computational complexity, and quadratically with the length of the input. Toward simplifying the input program of an LLM, the state-of-the-art approach has the strategies to filter the input code tokens based on the attention scores given by the LLM. The decision to simplify the input should not rely on the attention patterns of an LLM, as these patterns are influenced by both the model architecture and the pre-training dataset. Since the model and dataset are part of the solution domain, not the problem domain where the input belongs, the outcome may differ when the model is pre-trained on a different dataset. We propose SlimCode, a model-agnostic code simplification solution for LLMs that depends on the nature of input code tokens. As an empirical…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

gksajy/slimcode
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsText Readability and Simplification · Topic Modeling · Natural Language Processing Techniques