Estimating Difficulty Levels of Programming Problems with Pre-trained Model
Zhiyuan Wang, Wei Zhang, Jun Wang

TL;DR
This paper introduces a method to automatically estimate the difficulty levels of programming problems using a combined pre-trained text and code model, reducing reliance on expert annotations and student solution data.
Contribution
It proposes a novel approach coupling pre-trained text and code models for difficulty estimation and provides two new datasets for this task.
Findings
The combined model effectively estimates problem difficulty.
Both text and code modalities contribute significantly to accuracy.
The approach reduces the need for extensive expert annotation.
Abstract
As the demand for programming skills grows across industries and academia, students often turn to Programming Online Judge (POJ) platforms for coding practice and competition. The difficulty level of each programming problem serves as an essential reference for guiding students' adaptive learning. However, current methods of determining difficulty levels either require extensive expert annotations or take a long time to accumulate enough student solutions for each problem. To address this issue, we formulate the problem of automatic difficulty level estimation of each programming problem, given its textual description and a solution example of code. For tackling this problem, we propose to couple two pre-trained models, one for text modality and the other for code modality, into a unified model. We built two POJ datasets for the task and the results demonstrate the effectiveness of the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTeaching and Learning Programming · Machine Learning and Data Classification
