Completion by Comprehension: Guiding Code Generation with Multi-Granularity Understanding
Xinkui Zhao, Rongkai Liu, Yifan Zhang, Chen Zhi, Lufei Zhang, Guanjie Cheng, Yueshen Xu, Shuiguang Deng, Jianwei Yin

TL;DR
This paper introduces CoCo, a framework that enhances code completion by understanding multi-level structural and semantic context from large codebases, significantly improving accuracy over existing methods.
Contribution
CoCo uniquely integrates static code analysis with graph-based context selection and structure-aware re-ranking to improve code generation quality.
Findings
Achieves up to 20.2% EM improvement on benchmarks.
Effectively captures control flow and semantic dependencies.
Compatible with various code completion models.
Abstract
As code completion task from function-level to repository-level, leveraging contextual information from large-scale codebases becomes a core challenge. However, existing retrieval-augmented generation (RAG) methods typically treat code as plain natural language, relying primarily on shallow semantic matching while overlooking structural semantics and code-specific dependencies. This limits their ability to capture control flow and underlying intent, ultimately constraining the quality of generated code. Therefore, we propose CoCo, a novel framework that enables code Completion by Comprehension of multi-granularity context from large-scale code repositories. CoCo employs static code analysis to extract structured context at the function, file, and project levels, capturing execution logic and semantic dependencies. It then adopts an graph-based multi-granularity context selection…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Software Testing and Debugging Techniques · Software System Performance and Reliability
