Program Semantic Inequivalence Game with Large Language Models
Antonio Valerio Miceli-Barone, Vaishak Belle, Ali Payani

TL;DR
This paper introduces SInQ, a semantic inequivalence game that synthetically generates training data for LLMs, improving their ability to understand complex program semantics and perform better on code understanding benchmarks.
Contribution
The paper proposes a novel semi-adversarial training method using semantic inequivalence games to enhance LLMs' reasoning about program semantics, with theoretical and empirical validation.
Findings
Improves vulnerability detection in C/C++ code trained on Python data
Yields substantial gains on Python builtin identifier swap benchmark
Enables theoretically unlimited improvement through self-play
Abstract
Large Language Models (LLMs) can achieve strong performance on everyday coding tasks, but they can fail on complex tasks that require non-trivial reasoning about program semantics. Finding training examples to teach LLMs to solve these tasks can be challenging. In this work, we explore a method to synthetically generate code reasoning training data based on a semantic inequivalence game SInQ: a generator agent creates program variants that are semantically distinct, derived from a dataset of real-world programming tasks, while an evaluator agent has to identify input examples that cause the original programs and the generated variants to diverge in their behaviour, with the agents training each other semi-adversarially. We prove that this setup enables theoretically unlimited improvement through self-play in the limit of infinite computational resources. We evaluated our approach on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Software Engineering Research · Advanced Malware Detection Techniques
