Data-Driven AI Model Signal-Awareness Enhancement and Introspection
Sahil Suneja, Yufan Zhuang, Yunhui Zheng, Jim Laredo, Alessandro, Morari

TL;DR
This paper proposes data-driven methods to improve AI models' ability to recognize task-relevant signals in source code, combining code complexity concepts with curriculum learning and dataset augmentation, leading to significant enhancements in signal awareness.
Contribution
It introduces a novel approach that integrates code complexity and curriculum learning, along with dataset augmentation, to enhance AI models' signal-awareness in source code understanding.
Findings
Up to 4.8x improvement in model signal awareness.
Combines code complexity with curriculum learning effectively.
Introduces a dataset-based model introspection method.
Abstract
AI modeling for source code understanding tasks has been making significant progress, and is being adopted in production development pipelines. However, reliability concerns, especially whether the models are actually learning task-related aspects of source code, are being raised. While recent model-probing approaches have observed a lack of signal awareness in many AI-for-code models, i.e. models not capturing task-relevant signals, they do not offer solutions to rectify this problem. In this paper, we explore data-driven approaches to enhance models' signal-awareness: 1) we combine the SE concept of code complexity with the AI technique of curriculum learning; 2) we incorporate SE assistance into AI models by customizing Delta Debugging to generate simplified signal-preserving programs, augmenting them to the training dataset. With our techniques, we achieve up to 4.8x improvement in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Software Reliability and Analysis Research · Software Testing and Debugging Techniques
