minicons: Enabling Flexible Behavioral and Representational Analyses of Transformer Language Models
Kanishka Misra

TL;DR
minicons is an open source library that standardizes and simplifies behavioral and representational analysis of transformer language models, enabling efficient extraction of probabilities and vectors for research.
Contribution
It introduces a flexible API for analyzing transformer LMs at multiple levels, with practical applications demonstrated through case studies on BERT and other models.
Findings
Analyzed BERT's learning dynamics on grammatical judgments.
Benchmarking 23 LMs on zero-shot abductive reasoning.
Provided an accessible tool for the research community.
Abstract
We present minicons, an open source library that provides a standard API for researchers interested in conducting behavioral and representational analyses of transformer-based language models (LMs). Specifically, minicons enables researchers to apply analysis methods at two levels: (1) at the prediction level -- by providing functions to efficiently extract word/sentence level probabilities; and (2) at the representational level -- by also facilitating efficient extraction of word/phrase level vectors from one or more layers. In this paper, we describe the library and apply it to two motivating case studies: One focusing on the learning dynamics of the BERT architecture on relative grammatical judgments, and the other on benchmarking 23 different LMs on zero-shot abductive reasoning. minicons is available at https://github.com/kanishkamisra/minicons
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Speech and dialogue systems
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Multi-Head Attention · Attention Is All You Need · Linear Layer · Dropout · Layer Normalization · Adam · Attention Dropout · Residual Connection · Dense Connections
