A new approach for encoding code and assisting code understanding
Mengdan Fan, Wei Zhang, Haiyan Zhao, Zhi Jin

TL;DR
This paper introduces a novel code encoding paradigm inspired by diffusion techniques, enabling better global understanding and zero-shot prediction for code comprehension tasks, surpassing traditional autoregressive models.
Contribution
The paper proposes a heterogeneous image-based code encoding method inspired by diffusion models, improving code understanding and enabling zero-shot predictions, addressing limitations of autoregressive paradigms.
Findings
Achieved zero-shot prediction on 456,360 text-code pairs
Demonstrated improved global understanding of code
Proposed a new paradigm for code encoding inspired by diffusion models
Abstract
Some companies (e.g., Microsoft Research and Google DeepMind) have discovered some of the limitations of GPTs' autoregressive paradigm next-word prediction, manifested in the model's lack of planning, working memory, backtracking, and reasoning skills. GPTs rely on a local and greedy process of generating the next word, without a global understanding of the task or the output. We have confirmed the above limitations through specialized empirical studies of code comprehension. Although GPT-4 is good at producing fluent and coherent text, it cannot handle complex logic and generate new code that hasn't been seen, and it relies too much on the formatting of the prompt to generate the correct code. We propose a new paradigm for code understanding that goes beyond the next-word prediction paradigm, inspired by the successful application of diffusion techniques to image generation (Dalle-2,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTeaching and Learning Programming
MethodsContrastive Language-Image Pre-training · Diffusion
