MateInfoUB: A Real-World Benchmark for Testing LLMs in Competitive, Multilingual, and Multimodal Educational Tasks

Dumitran Adrian Marius; Theodor-Pierre Moroianu; Buca Mihnea-Vicentiu

arXiv:2507.03162·cs.CY·September 30, 2025

MateInfoUB: A Real-World Benchmark for Testing LLMs in Competitive, Multilingual, and Multimodal Educational Tasks

Dumitran Adrian Marius, Theodor-Pierre Moroianu, Buca Mihnea-Vicentiu

PDF

Open Access

TL;DR

This paper introduces MateInfoUB, a bilingual multimodal dataset for testing LLMs in complex CS educational tasks, revealing their strengths and limitations in multilingual and multimodal contexts.

Contribution

It presents a novel bilingual, multimodal dataset based on high-level CS competition questions and systematically evaluates LLMs, highlighting their performance and challenges in educational settings.

Findings

01

LLMs perform variably across languages and modalities.

02

Language choice impacts LLM reasoning capabilities.

03

The dataset enables assessment of LLMs in realistic CS tasks.

Abstract

The rapid advancement of Large Language Models (LLMs) has transformed various domains, particularly computer science (CS) education. These models exhibit remarkable capabilities in code-related tasks and problem-solving, raising questions about their potential and limitations in advanced CS contexts. This study presents a novel bilingual (English-Romanian) multimodal (text and image) dataset of multiple-choice questions derived from a high-level computer science competition. A particularity of our dataset is that the problems are conceived such that some of them are easier solved using reasoning on paper, while for others writing code is more efficient. We systematically evaluate State of The Art LLMs on this dataset, analyzing their performance on theoretical programming tasks. Our findings reveal the strengths and limitations of current LLMs, including the influence of language choice…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques