DCE-LLM: Dead Code Elimination with Large Language Models
Minyu Chen, Guoqiang Li, Ling-I Wu, Ruibang Liu

TL;DR
DCE-LLM is an automated framework that uses large language models and attribution techniques to detect, explain, and eliminate dead code across multiple programming languages, improving software quality and security.
Contribution
It introduces a novel, automated dead code elimination method leveraging LLMs and attribution-based line selection, outperforming existing tools in accuracy and language support.
Findings
Achieves over 94% F1 score in dead code detection
Outperforms GPT-4o by 30% in accuracy
Supports multiple programming languages
Abstract
Dead code introduces several challenges in software development, such as increased binary size and maintenance difficulties. It can also obscure logical errors and be exploited for obfuscation in malware. For LLM-based code-related tasks, dead code introduces vulnerabilities that can mislead these models, raising security concerns. Although modern compilers and IDEs offer dead code elimination, sophisticated patterns can bypass these tools. A universal approach that includes classification, location, explanation, and correction is needed, yet current tools often require significant manual effort. We present DCE-LLM, a framework for automated dead code elimination using a small CodeBERT model with an attribution-based line selector to efficiently locate suspect code. LLMs then generate judgments and explanations, fine-tuned on a large-scale, annotated dead code dataset to provide…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsNatural Language Processing Techniques · Web Application Security Vulnerabilities · Text Readability and Simplification
