A new approach for encoding code and assisting code understanding

Mengdan Fan; Wei Zhang; Haiyan Zhao; Zhi Jin

arXiv:2408.00521·cs.AI·March 25, 2025

A new approach for encoding code and assisting code understanding

Mengdan Fan, Wei Zhang, Haiyan Zhao, Zhi Jin

PDF

Open Access

TL;DR

This paper introduces a novel code encoding paradigm inspired by diffusion techniques, enabling better global understanding and zero-shot prediction for code comprehension tasks, surpassing traditional autoregressive models.

Contribution

The paper proposes a heterogeneous image-based code encoding method inspired by diffusion models, improving code understanding and enabling zero-shot predictions, addressing limitations of autoregressive paradigms.

Findings

01

Achieved zero-shot prediction on 456,360 text-code pairs

02

Demonstrated improved global understanding of code

03

Proposed a new paradigm for code encoding inspired by diffusion models

Abstract

Some companies (e.g., Microsoft Research and Google DeepMind) have discovered some of the limitations of GPTs' autoregressive paradigm next-word prediction, manifested in the model's lack of planning, working memory, backtracking, and reasoning skills. GPTs rely on a local and greedy process of generating the next word, without a global understanding of the task or the output. We have confirmed the above limitations through specialized empirical studies of code comprehension. Although GPT-4 is good at producing fluent and coherent text, it cannot handle complex logic and generate new code that hasn't been seen, and it relies too much on the formatting of the prompt to generate the correct code. We propose a new paradigm for code understanding that goes beyond the next-word prediction paradigm, inspired by the successful application of diffusion techniques to image generation (Dalle-2,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTeaching and Learning Programming

MethodsContrastive Language-Image Pre-training · Diffusion