ConceptX: A Framework for Latent Concept Analysis
Firoj Alam, Fahim Dalvi, Nadir Durrani, Hassan Sajjad and, Abdul Rafae Khan, Jia Xu

TL;DR
ConceptX is a human-in-the-loop framework that uncovers, visualizes, and annotates latent concepts in pre-trained language models, aiding interpretability and bias detection in NLP models.
Contribution
It introduces an unsupervised method for discovering concepts and a graphical interface for human annotation, including auto-annotations based on linguistic ontologies.
Findings
Discovered diverse linguistic and task-specific concepts in language models.
Enabled annotation of bias-related concepts such as gender and religious connotations.
Provided a resource for understanding and mitigating biases in NLP models.
Abstract
The opacity of deep neural networks remains a challenge in deploying solutions where explanation is as important as precision. We present ConceptX, a human-in-the-loop framework for interpreting and annotating latent representational space in pre-trained Language Models (pLMs). We use an unsupervised method to discover concepts learned in these models and enable a graphical interface for humans to generate explanations for the concepts. To facilitate the process, we provide auto-annotations of the concepts (based on traditional linguistic ontologies). Such annotations enable development of a linguistic resource that directly represents latent concepts learned within deep NLP models. These include not just traditional linguistic concepts, but also task-specific or sensitive concepts (words grouped based on gender or religious connotation) that helps the annotators to mark bias in the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Machine Learning and Data Classification
