SE Factual Knowledge in Frozen Giant Code Model: A Study on FQN and its   Retrieval

Qing Huang; Dianshu Liao; Zhenchang Xing; Zhiqiang Yuan; Qinghua Lu,; Xiwei Xu; Jiaxing Lu

arXiv:2212.08221·cs.SE·December 19, 2022·5 cites

SE Factual Knowledge in Frozen Giant Code Model: A Study on FQN and its Retrieval

Qing Huang, Dianshu Liao, Zhenchang Xing, Zhiqiang Yuan, Qinghua Lu,, Xiwei Xu, Jiaxing Lu

PDF

Open Access

TL;DR

This study systematically investigates the factual knowledge of Fully Qualified Names (FQNs) in the GPT-based code model Copilot, proposing a lightweight in-context learning method for FQN inference that does not require code compilation or gradient updates.

Contribution

The paper introduces a novel in-context learning approach for FQN inference in Copilot, analyzing its effectiveness and factors influencing performance, which advances understanding of PCMs in software engineering.

Findings

01

Copilot stores diverse FQN knowledge with high inference accuracy.

02

The proposed in-context learning method does not require code compilation or gradient updates.

03

Optimal in-context learning configuration improves FQN inference performance.

Abstract

Pre-trained giant code models (PCMs) start coming into the developers' daily practices. Understanding what types of and how much software knowledge is packed into PCMs is the foundation for incorporating PCMs into software engineering (SE) tasks and fully releasing their potential. In this work, we conduct the first systematic study on the SE factual knowledge in the state-of-the-art PCM CoPilot, focusing on APIs' Fully Qualified Names (FQNs), the fundamental knowledge for effective code analysis, search and reuse. Driven by FQNs' data distribution properties, we design a novel lightweight in-context learning on Copilot for FQN inference, which does not require code compilation as traditional methods or gradient update by recent FQN prompt-tuning. We systematically experiment with five in-context-learning design factors to identify the best in-context learning configuration that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSoftware Engineering Research · Scientific Computing and Data Management · Software System Performance and Reliability