SE Factual Knowledge in Frozen Giant Code Model: A Study on FQN and its Retrieval
Qing Huang, Dianshu Liao, Zhenchang Xing, Zhiqiang Yuan, Qinghua Lu,, Xiwei Xu, Jiaxing Lu

TL;DR
This study systematically investigates the factual knowledge of Fully Qualified Names (FQNs) in the GPT-based code model Copilot, proposing a lightweight in-context learning method for FQN inference that does not require code compilation or gradient updates.
Contribution
The paper introduces a novel in-context learning approach for FQN inference in Copilot, analyzing its effectiveness and factors influencing performance, which advances understanding of PCMs in software engineering.
Findings
Copilot stores diverse FQN knowledge with high inference accuracy.
The proposed in-context learning method does not require code compilation or gradient updates.
Optimal in-context learning configuration improves FQN inference performance.
Abstract
Pre-trained giant code models (PCMs) start coming into the developers' daily practices. Understanding what types of and how much software knowledge is packed into PCMs is the foundation for incorporating PCMs into software engineering (SE) tasks and fully releasing their potential. In this work, we conduct the first systematic study on the SE factual knowledge in the state-of-the-art PCM CoPilot, focusing on APIs' Fully Qualified Names (FQNs), the fundamental knowledge for effective code analysis, search and reuse. Driven by FQNs' data distribution properties, we design a novel lightweight in-context learning on Copilot for FQN inference, which does not require code compilation as traditional methods or gradient update by recent FQN prompt-tuning. We systematically experiment with five in-context-learning design factors to identify the best in-context learning configuration that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Scientific Computing and Data Management · Software System Performance and Reliability
