Adversarially Regularising Neural NLI Models to Integrate Logical Background Knowledge
Pasquale Minervini, Sebastian Riedel

TL;DR
This paper introduces a method for automatically generating adversarial examples in NLP that violate logical constraints, and uses this to regularize neural NLI models, improving their robustness and accuracy on adversarial datasets.
Contribution
It presents a novel approach to generate adversarial examples based on logical violations and uses them to regularize NLI models, enhancing their robustness against adversarial attacks.
Findings
Significant accuracy improvement on adversarial datasets (up to 79.6%)
Reduces background knowledge violations in NLI models
Adversarial examples transfer across different model architectures
Abstract
Adversarial examples are inputs to machine learning models designed to cause the model to make a mistake. They are useful for understanding the shortcomings of machine learning models, interpreting their results, and for regularisation. In NLP, however, most example generation strategies produce input text by using known, pre-specified semantic transformations, requiring significant manual effort and in-depth understanding of the problem and domain. In this paper, we investigate the problem of automatically generating adversarial examples that violate a set of given First-Order Logic constraints in Natural Language Inference (NLI). We reduce the problem of identifying such adversarial examples to a combinatorial optimisation problem, by maximising a quantity measuring the degree of violation of such constraints and by using a language model for generating linguistically-plausible…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Topic Modeling · Natural Language Processing Techniques
