# Are My Invariants Valid? A Learning Approach

**Authors:** Vincent J. Hellendoorn, Premkumar T. Devanbu, Oleksandr Polozov, and Mark Marron

arXiv: 1903.06089 · 2019-03-19

## TL;DR

This paper presents a machine-learning approach using deep neural networks to validate program invariants directly from source code, reducing false positives and improving invariant suggestion accuracy.

## Contribution

It introduces a scalable method for generating labeled invariants from large test suites and trains a deep neural model to predict invariant validity based on code structure.

## Key findings

- Model accurately labels invariants in cross-project settings
- Uses noisy supervision from test suite traces for training
- Performs well on a curated dataset of invariants

## Abstract

Ensuring that a program operates correctly is a difficult task in large, complex systems. Enshrining invariants -- desired properties of correct execution -- in code or comments can support maintainability and help sustain correctness. Tools that can automatically infer and recommend invariants can thus be very beneficial. However, current invariant-suggesting tools, such as Daikon, suffer from high rates of false positives, in part because they only leverage traced program values from available test cases, rather than directly exploiting knowledge of the source code per se. We propose a machine-learning approach to judging the validity of invariants, specifically of method pre- and post-conditions, based directly on a method's source code. We introduce a new, scalable approach to creating labeled invariants: using programs with large test-suites, we generate Daikon invariants using traces from subsets of these test-suites, and then label these as valid/invalid by cross-validating them with held-out tests. This process induces a large set of labels that provide a form of noisy supervision, which is then used to train a deep neural model, based on gated graph neural networks. Our model learns to map the lexical, syntactic, and semantic structure of a given method's body into a probability that a candidate pre- or post-condition on that method's body is correct and is able to accurately label invariants based on the noisy signal, even in cross-project settings. Most importantly, it performs well on a hand-curated dataset of invariants.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1903.06089/full.md

## Figures

7 figures with captions in the complete paper: https://tomesphere.com/paper/1903.06089/full.md

## References

36 references — full list in the complete paper: https://tomesphere.com/paper/1903.06089/full.md

---
Source: https://tomesphere.com/paper/1903.06089