Explaining Black-box Language Models with Knowledge Probing Systems: A Post-hoc Explanation Perspective

Yunxiao Zhao; Hao Xu; Zhiqiang Wang; Xiaoli Li; Jiye Liang; Ru Li

arXiv:2508.16969·cs.CL·August 26, 2025

Explaining Black-box Language Models with Knowledge Probing Systems: A Post-hoc Explanation Perspective

Yunxiao Zhao, Hao Xu, Zhiqiang Wang, Xiaoli Li, Jiye Liang, Ru Li

PDF

TL;DR

This paper introduces KnowProb, a post-hoc probing system that explains black-box language models by assessing their implicit knowledge understanding beyond surface content, revealing their limitations and aiding interpretability.

Contribution

The paper proposes a novel knowledge-guided probing method, KnowProb, to evaluate and explain the implicit knowledge in black-box language models from multiple perspectives.

Findings

01

Current PLMs primarily learn a single distribution of representations.

02

PLMs face challenges in capturing hidden knowledge behind text.

03

KnowProb effectively identifies limitations of black-box models.

Abstract

Pre-trained Language Models (PLMs) are trained on large amounts of unlabeled data, yet they exhibit remarkable reasoning skills. However, the trustworthiness challenges posed by these black-box models have become increasingly evident in recent years. To alleviate this problem, this paper proposes a novel Knowledge-guided Probing approach called KnowProb in a post-hoc explanation way, which aims to probe whether black-box PLMs understand implicit knowledge beyond the given text, rather than focusing only on the surface level content of the text. We provide six potential explanations derived from the underlying content of the given text, including three knowledge-based understanding and three association-based reasoning. In experiments, we validate that current small-scale (or large-scale) PLMs only learn a single distribution of representation, and still face significant challenges in…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.