TL;DR
UniXcoder is a unified cross-modal pre-trained model for programming languages that effectively combines code, comments, and ASTs to improve performance on various code understanding and generation tasks.
Contribution
It introduces a novel unified encoder-decoder framework with cross-modal learning and a new method for encoding ASTs as sequences, enhancing code representation.
Findings
Achieves state-of-the-art results on multiple code tasks
Effectively leverages comments and ASTs for improved performance
Introduces zero-shot code-to-code search dataset
Abstract
Pre-trained models for programming languages have recently demonstrated great success on code intelligence. To support both code-related understanding and generation tasks, recent works attempt to pre-train unified encoder-decoder models. However, such encoder-decoder framework is sub-optimal for auto-regressive tasks, especially code completion that requires a decoder-only manner for efficient inference. In this paper, we present UniXcoder, a unified cross-modal pre-trained model for programming language. The model utilizes mask attention matrices with prefix adapters to control the behavior of the model and leverages cross-modal contents like AST and code comment to enhance code representation. To encode AST that is represented as a tree in parallel, we propose a one-to-one mapping method to transform AST in a sequence structure that retains all structural information from the tree.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗microsoft/unixcoder-basemodel· 188k dl· ♡ 67188k dl♡ 67
- 🤗microsoft/unixcoder-base-ninemodel· 5.3k dl· ♡ 225.3k dl♡ 22
- 🤗Lazyhope/RepoSimmodel· 43 dl· ♡ 143 dl♡ 1
- 🤗HuanWang/testmodel· 24 dl24 dl
- 🤗Henry65/RepoSim4Pymodel· 13 dl13 dl
- 🤗claudios/unixcoder-basemodel· 220 dl220 dl
- 🤗claudios/unixcoder-base-unimodalmodel· 10 dl10 dl
- 🤗codemetic/cwebertmodel· 6 dl6 dl
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
