Blackbird Language Matrices: A Framework to Investigate the Linguistic Competence of Language Models
Paola Merlo, Chunyang Jiang, Giuseppe Samo, Vivi Nastase

TL;DR
The paper introduces the Blackbird Language Matrices (BLM), a structured language task designed to evaluate linguistic competence and systematicity in large language models through curated, multi-level multiple-choice datasets.
Contribution
It presents a novel, structured language task and dataset (BLM) for probing linguistic and reasoning abilities of language models, including their detection of linguistic objects and systematic patterns.
Findings
LLMs can solve BLMs with good performance across languages
Representations in LLMs contain relevant grammatical objects and attributes
Models detect and utilize systematic patterns across sentences
Abstract
This article describes a novel language task, the Blackbird Language Matrices (BLM) task, inspired by intelligence tests, and illustrates the BLM datasets, their construction and benchmarking, and targeted experiments on chunking and systematicity. BLMs are multiple-choice problems, structured at multiple levels: within each sentence, across the input sequence, within each candidate answer. Because of their rich structure, these curated, but naturalistic datasets are key to answer some core questions about current large language models abilities: do LLMs detect linguistic objects and their properties? Do they detect and use systematic patterns across sentences? Are they more prone to linguistic or reasoning errors, and how do these interact? We show that BLMs, while challenging, can be solved at good levels of performance, in more than one language, with simple baseline models or, at…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Topic Modeling · Multimodal Machine Learning Applications
