Multimodal Late Fusion Model for Problem-Solving Strategy Classification in a Machine Learning Game
Clemens Witt, Thiemo Leonhardt, Nadine Bergner, Mareen Grillenberger

TL;DR
This paper introduces a multimodal late fusion model combining visual and action data to classify problem-solving strategies in educational games, demonstrating improved accuracy over single-modality models in a student study.
Contribution
It presents a novel multimodal fusion approach that enhances strategy classification accuracy in digital learning environments, integrating visual and action data.
Findings
Fusion model outperformed unimodal models by over 15% in accuracy.
Multimodal approach improves strategy detection in educational games.
Results support multimodal ML for adaptive learning assessments.
Abstract
Machine learning models are widely used to support stealth assessment in digital learning environments. Existing approaches typically rely on abstracted gameplay log data, which may overlook subtle behavioral cues linked to learners' cognitive strategies. This paper proposes a multimodal late fusion model that integrates screencast-based visual data and structured in-game action sequences to classify students' problem-solving strategies. In a pilot study with secondary school students (N=149) playing a multitouch educational game, the fusion model outperformed unimodal baseline models, increasing classification accuracy by over 15%. Results highlight the potential of multimodal ML for strategy-sensitive assessment and adaptive support in interactive learning contexts.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
