CodeSCM: Causal Analysis for Multi-Modal Code Generation
Mukur Gupta, Noopur Bhatt, Suman Jana

TL;DR
This paper introduces CodeSCM, a causal analysis framework for multi-modal code generation with large language models, revealing how different prompt modalities influence generated code.
Contribution
It presents a novel Structural Causal Model with latent mediators to analyze and quantify the effects of prompt modalities on code generation.
Findings
Input-output examples significantly influence code generation
Natural language instructions impact model outputs
Causal mediation analysis quantifies direct effects of prompt modalities
Abstract
In this paper, we propose CodeSCM, a Structural Causal Model (SCM) for analyzing multi-modal code generation using large language models (LLMs). By applying interventions to CodeSCM, we measure the causal effects of different prompt modalities, such as natural language, code, and input-output examples, on the model. CodeSCM introduces latent mediator variables to separate the code and natural language semantics of a multi-modal code generation prompt. Using the principles of Causal Mediation Analysis on these mediators we quantify direct effects representing the model's spurious leanings. We find that, in addition to natural language instructions, input-output examples significantly influence code generation.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsModel-Driven Software Engineering Techniques · Software Engineering Research · Software Testing and Debugging Techniques
