Code Comprehension with GitHub Copilot: Performance Gains, Comprehension Trade-offs, and Behavioral Predictors in Brownfield Programming
Yunhan Qiao, Md Istiak Hossain Shihab, Summit Haque, and Christopher Hundhausen

TL;DR
This study investigates how GitHub Copilot affects code comprehension in students, revealing that while it improves performance, it may hinder understanding unless actively engaged with the generated code.
Contribution
It uncovers the decoupling between performance and comprehension with Copilot and highlights the importance of verification behaviors for understanding.
Findings
Performance improved with Copilot but no overall comprehension gain.
Active verification of generated code predicts better comprehension.
Passive use of Copilot correlates with reduced understanding.
Abstract
Teaching Computer Science (CS) students how to comprehend and maintain legacy code bases is a critical challenge in software engineering education. While Generative AI (GenAI) assistants like GitHub Copilot improve task completion speed and correctness, their impact on code understanding remains unclear. We conducted a within-subject study with 15 graduate CS students completing feature implementation tasks with and without Copilot. Despite significant performance improvements, participants showed no overall comprehension improvement (), revealing a \textit{comprehension-performance decoupling}. Further analysis uncovered a \textit{comprehension trade-off}: performance gains negatively correlated with reverse engineering comprehension (, ) but showed a positive trend with implementation comprehension (, ). A follow-up behavioral analysis…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
