Towards Assurance of LLM Adversarial Robustness using Ontology-Driven   Argumentation

Tomas Bueno Momcilovic; Beat Buesser; Giulio Zizzo; Mark Purcell; Dian; Balta

arXiv:2410.07962·cs.AI·October 11, 2024

Towards Assurance of LLM Adversarial Robustness using Ontology-Driven Argumentation

Tomas Bueno Momcilovic, Beat Buesser, Giulio Zizzo, Mark Purcell, Dian, Balta

PDF

Open Access

TL;DR

This paper presents an ontology-driven argumentation framework to enhance the assurance of adversarial robustness in large language models, combining formalization, human-readability, and machine-readability for improved security and transparency.

Contribution

It introduces a novel formal argumentation approach using ontologies to structure attacks and defenses, enabling better assurance of LLM robustness.

Findings

01

Structured attack and defense representations using ontologies.

02

Created human-readable assurance cases for LLM robustness.

03

Demonstrated applicability in language and code translation tasks.

Abstract

Despite the impressive adaptability of large language models (LLMs), challenges remain in ensuring their security, transparency, and interpretability. Given their susceptibility to adversarial attacks, LLMs need to be defended with an evolving combination of adversarial training and guardrails. However, managing the implicit and heterogeneous knowledge for continuously assuring robustness is difficult. We introduce a novel approach for assurance of the adversarial robustness of LLMs based on formal argumentation. Using ontologies for formalization, we structure state-of-the-art attacks and defenses, facilitating the creation of a human-readable assurance case, and a machine-readable representation. We demonstrate its application with examples in English language and code translation tasks, and provide implications for theory and practice, by targeting engineers, data scientists, users,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning