CoTexT: Multi-task Learning with Code-Text Transformer
Long Phan, Hieu Tran, Daniel Le, Hieu Nguyen, James Anibal, Alec, Peltekian, and Yanfang Ye

TL;DR
CoTexT is a transformer-based model trained on large code and text corpora, enabling it to excel at various programming language tasks like summarization, code generation, and defect detection, achieving state-of-the-art results.
Contribution
The paper introduces CoTexT, a novel multi-task pre-trained transformer model that effectively handles multiple NL-PL tasks with state-of-the-art performance.
Findings
Achieves SOTA results on CodeXGLUE tasks
Effective multi-task learning across diverse programming languages
Versatile performance on code summarization, generation, and debugging
Abstract
We present CoTexT, a pre-trained, transformer-based encoder-decoder model that learns the representative context between natural language (NL) and programming language (PL). Using self-supervision, CoTexT is pre-trained on large programming language corpora to learn a general understanding of language and code. CoTexT supports downstream NL-PL tasks such as code summarizing/documentation, code generation, defect detection, and code debugging. We train CoTexT on different combinations of available PL corpus including both "bimodal" and "unimodal" data. Here, bimodal data is the combination of text and corresponding code snippets, whereas unimodal data is merely code snippets. We first evaluate CoTexT with multi-task learning: we perform Code Summarization on 6 different programming languages and Code Refinement on both small and medium size featured in the CodeXGLUE dataset. We further…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Topic Modeling · Natural Language Processing Techniques
