Response-free item difficulty modelling for multiple-choice items with fine-tuned transformers: Component-wise representation and multi-task learning
Jan Net\'ik, Patr\'icia Martinkov\'a

TL;DR
This paper introduces a transformer-based approach for modeling item difficulty in multiple-choice reading comprehension tasks, reducing manual feature engineering and improving performance with multi-task learning especially in small data regimes.
Contribution
It proposes end-to-end transformer fine-tuning with component-wise and multi-task variants, demonstrating the effectiveness of joint encoding and auxiliary tasks in item difficulty modeling.
Findings
Joint encoding is a viable alternative to feature engineering.
Multi-task learning improves performance in small-sample settings.
Transformer fine-tuning captures wording-derived signals effectively.
Abstract
Response-free item difficulty modelling promises to reduce reliance on response-based calibration but is intrinsically difficult on reading-comprehension multiple-choice items, where difficulty depends on inferential demands across wording components. Whereas most existing approaches extract item-text features and pass them to a separate statistical or machine-learning model, we fine-tune transformer encoders end-to-end on the item wording, eliminating the manual feature engineering and preprocessing that discards information. Moreover, two extensions to this joint-encoding approach are proposed: a component-wise variant that encodes wording components separately through a shared encoder, and a multi-task variant that retains joint encoding and adds an auxiliary multiple-choice question answering objective on the shared encoder. Each method is evaluated under a Monte Carlo subsampling…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
