aiXcoder-7B: A Lightweight and Effective Large Language Model for Code   Processing

Siyuan Jiang; Jia Li; He Zong; Huanyu Liu; Hao Zhu; Shukai Hu; Erlu; Li; Jiazheng Ding; Yu Han; Wei Ning; Gen Wang; Yihong Dong; Kechi Zhang; Ge; Li

arXiv:2410.13187·cs.CL·January 17, 2025

aiXcoder-7B: A Lightweight and Effective Large Language Model for Code Processing

Siyuan Jiang, Jia Li, He Zong, Huanyu Liu, Hao Zhu, Shukai Hu, Erlu, Li, Jiazheng Ding, Yu Han, Wei Ning, Gen Wang, Yihong Dong, Kechi Zhang, Ge, Li

PDF

Open Access 1 Repo

TL;DR

aiXcoder-7B is a compact yet highly accurate code completion language model that leverages multi-objective training, diverse data sampling, and extensive high-quality data to outperform larger models.

Contribution

The paper introduces aiXcoder-7B, a lightweight LLM for code with novel training objectives and data strategies, achieving superior performance with fewer parameters.

Findings

01

aiXcoder-7B outperforms six similar-sized LLMs in code completion benchmarks.

02

It surpasses larger models like StarCoder2-15B and CodeLlama-34B in accuracy.

03

The model has been open-sourced and widely adopted, with over 2,200 GitHub stars.

Abstract

Large Language Models (LLMs) have been widely used in code completion, and researchers are focusing on scaling up LLMs to improve their accuracy. However, larger LLMs have lower inference efficiency, affecting developers' experience and productivity. In this paper, we propose a lightweight and effective LLM for code completion named aiXcoder-7B. Compared to existing LLMs, aiXcoder-7B achieves higher code completion accuracy while having smaller scales (i.e., 7 billion parameters). We attribute the superiority of aiXcoder-7B to three key factors: (1) Multi-objective training. We employ three training objectives, one of which is our proposed Structured Fill-In-the-Middle (SFIM). SFIM considers the syntax structures in code and effectively improves the performance of LLMs for code. (2) Diverse data sampling strategies. They consider inter-file relationships and enhance the capability of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

aixcoder-plugin/aixcoder-7b
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Machine Learning in Bioinformatics · Topic Modeling