MALAMUTE: A Multilingual, Highly-granular, Template-free, Education-based Probing Dataset
Sagi Shaier, George Arthur Baker, Chiranthan Sridhar, Lawrence E Hunter, Katharina von der Wense

TL;DR
MALAMUTE is a comprehensive, multilingual, template-free dataset designed to evaluate language models' knowledge in specific educational domains with high granularity, revealing significant gaps in their understanding of university-level concepts.
Contribution
This paper introduces MALAMUTE, the first education-focused, highly granular, multilingual probing dataset that overcomes limitations of existing benchmarks by being template-free and covering detailed curriculum concepts.
Findings
Language models show significant knowledge gaps in specific educational topics.
MALAMUTE covers 8 domains across 3 languages with over 33,000 concepts.
Models perform variably, indicating room for improvement in educational applications.
Abstract
Language models (LMs) have excelled in various broad domains. However, to ensure their safe and effective integration into real-world educational settings, they must demonstrate proficiency in specific, granular areas of knowledge. Existing cloze-style benchmarks, commonly used to evaluate LMs' knowledge, have three major limitations. They: 1) do not cover the educational domain; 2) typically focus on low-complexity, generic knowledge or broad domains, which do not adequately assess the models' knowledge in specific subjects; and 3) often rely on templates that can bias model predictions. Here, we introduce MALAMUTE, a multilingual, template-free, and highly granular probing dataset comprising expert-written, peer-reviewed probes from 71 university-level textbooks across three languages (English, Spanish, and Polish). MALAMUTE is the first education-based cloze-style dataset. It covers…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDigital Media Forensic Detection · Handwritten Text Recognition Techniques · Machine Learning and Data Classification
MethodsFocus
