Knowledge Graph Guided Evaluation of Abstention Techniques

Kinshuk Vasisht; Navreet Kaur; Danish Pruthi

arXiv:2412.07430·cs.CL·February 11, 2025

Knowledge Graph Guided Evaluation of Abstention Techniques

Kinshuk Vasisht, Navreet Kaur, Danish Pruthi

PDF

Open Access 1 Video

TL;DR

This paper introduces SELECT, a knowledge graph-based benchmark to evaluate how well language models abstain from inappropriate responses, revealing trade-offs and limitations of current abstention techniques.

Contribution

It presents a novel benchmark, SELECT, grounded in knowledge graphs, to systematically evaluate and compare abstention techniques in language models.

Findings

01

Abstention techniques achieve over 80% abstention on target concepts.

02

Effectiveness drops by 19% for descendants of target concepts.

03

No single technique outperforms others across all scenarios.

Abstract

To deploy language models safely, it is crucial that they abstain from responding to inappropriate requests. Several prior studies test the safety promises of models based on their effectiveness in blocking malicious requests. In this work, we focus on evaluating the underlying techniques that cause models to abstain. We create SELECT, a benchmark derived from a set of benign concepts (e.g., "rivers") from a knowledge graph. Focusing on benign concepts isolates the effect of safety training, and grounding these concepts in a knowledge graph allows us to study the generalization and specificity of abstention techniques. Using SELECT, we benchmark different abstention techniques over six open-weight and closed-source models. We find that the examined techniques indeed cause models to abstain with over $80%$ abstention rates. However, these techniques are not as effective for descendants…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Knowledge Graph Guided Evaluation of Abstention Techniques· underline

Taxonomy

TopicsEEG and Brain-Computer Interfaces · Intravenous Infusion Technology and Safety

MethodsDirect Preference Optimization