Semantic and Structural Analysis of Implicit Biases in Large Language Models: An Interpretable Approach

Renhan Zhang; Lian Lian; Zhen Qi; Guiran Liu

arXiv:2508.06155·cs.CL·August 11, 2025

Semantic and Structural Analysis of Implicit Biases in Large Language Models: An Interpretable Approach

Renhan Zhang, Lian Lian, Zhen Qi, Guiran Liu

PDF

Open Access

TL;DR

This paper introduces an interpretable bias detection method for large language models that identifies hidden social biases through semantic and structural analysis, enhancing transparency and reliability.

Contribution

It presents a novel bias detection approach combining nested semantic representation with contextual contrast, improving interpretability and detection accuracy in LLMs.

Findings

01

Achieves high bias detection accuracy across multiple stereotype dimensions

02

Maintains semantic consistency and output stability during bias analysis

03

Provides transparent insights into internal bias mechanisms of language models

Abstract

This paper addresses the issue of implicit stereotypes that may arise during the generation process of large language models. It proposes an interpretable bias detection method aimed at identifying hidden social biases in model outputs, especially those semantic tendencies that are not easily captured through explicit linguistic features. The method combines nested semantic representation with a contextual contrast mechanism. It extracts latent bias features from the vector space structure of model outputs. Using attention weight perturbation, it analyzes the model's sensitivity to specific social attribute terms, thereby revealing the semantic pathways through which bias is formed. To validate the effectiveness of the method, this study uses the StereoSet dataset, which covers multiple stereotype dimensions including gender, profession, religion, and race. The evaluation focuses on…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Computational and Text Analysis Methods