Code2Snapshot: Using Code Snapshots for Learning Representations of Source Code
Md Rafiqul Islam Rabin, Mohammad Amin Alipour

TL;DR
This paper introduces Code2Snapshot, a novel source code representation based on program snapshots, which performs comparably to state-of-the-art methods in code summarization and classification tasks, highlighting the importance of structure over semantic details.
Contribution
The paper proposes Code2Snapshot, a new code representation method using program snapshots, and demonstrates its effectiveness and robustness in code understanding tasks.
Findings
Code2Snapshot achieves similar performance to advanced representations.
Obscuring code has little impact on Code2Snapshot performance.
Neural models may rely mainly on code structure for some tasks.
Abstract
There are several approaches for encoding source code in the input vectors of neural models. These approaches attempt to include various syntactic and semantic features of input programs in their encoding. In this paper, we investigate Code2Snapshot, a novel representation of the source code that is based on the snapshots of input programs. We evaluate several variations of this representation and compare its performance with state-of-the-art representations that utilize the rich syntactic and semantic features of input programs. Our preliminary study on the utility of Code2Snapshot in the code summarization and code classification tasks suggests that simple snapshots of input programs have comparable performance to state-of-the-art representations. Interestingly, obscuring input programs have insignificant impacts on the Code2Snapshot performance, suggesting that, for some tasks,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
