SelfPrompt: Autonomously Evaluating LLM Robustness via   Domain-Constrained Knowledge Guidelines and Refined Adversarial Prompts

Aihua Pei; Zehua Yang; Shunan Zhu; Ruoxi Cheng; Ju Jia

arXiv:2412.00765·cs.CL·December 3, 2024

SelfPrompt: Autonomously Evaluating LLM Robustness via Domain-Constrained Knowledge Guidelines and Refined Adversarial Prompts

Aihua Pei, Zehua Yang, Shunan Zhu, Ruoxi Cheng, Ju Jia

PDF

Open Access

TL;DR

This paper presents SelfPrompt, a framework for autonomous LLM robustness evaluation using domain-specific knowledge graphs and refined adversarial prompts, reducing reliance on traditional benchmarks and enabling targeted assessments.

Contribution

Introduces SelfPrompt, a novel self-evaluation framework that generates domain-constrained adversarial prompts for LLM robustness assessment without external benchmarks.

Findings

01

Effective in evaluating ChatGPT, Llama-3.1, Phi-3, and Mistral.

02

Reduces dependency on traditional evaluation datasets.

03

Provides targeted robustness insights in specific domains.

Abstract

Traditional methods for evaluating the robustness of large language models (LLMs) often rely on standardized benchmarks, which can escalate costs and limit evaluations across varied domains. This paper introduces a novel framework designed to autonomously evaluate the robustness of LLMs by incorporating refined adversarial prompts and domain-constrained knowledge guidelines in the form of knowledge graphs. Our method systematically generates descriptive sentences from domain-constrained knowledge graph triplets to formulate adversarial prompts, enhancing the relevance and challenge of the evaluation. These prompts, generated by the LLM itself and tailored to evaluate its own robustness, undergo a rigorous filtering and refinement process, ensuring that only those with high textual fluency and semantic fidelity are used. This self-evaluation mechanism allows the LLM to evaluate its…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning