Addressing Data Scarcity in Multimodal User State Recognition by Combining Semi-Supervised and Supervised Learning
Hendric Vo{\ss}, Heiko Wersing, Stefan Kopp

TL;DR
This paper introduces a multimodal machine learning method that effectively detects user disagreement and confusion in human-robot interactions using limited labeled data by combining semi-supervised and supervised learning, enhancing robustness.
Contribution
It presents a novel approach that combines semi-supervised and supervised learning for user state recognition with minimal labeled data in multimodal settings.
Findings
Achieved 81.1% F1-score in dis-/agreement detection.
Demonstrated improved robustness over purely supervised methods.
Utilized a new preprocessing pipeline for multimodal data.
Abstract
Detecting mental states of human users is crucial for the development of cooperative and intelligent robots, as it enables the robot to understand the user's intentions and desires. Despite their importance, it is difficult to obtain a large amount of high quality data for training automatic recognition algorithms as the time and effort required to collect and label such data is prohibitively high. In this paper we present a multimodal machine learning approach for detecting dis-/agreement and confusion states in a human-robot interaction environment, using just a small amount of manually annotated data. We collect a data set by conducting a human-robot interaction study and develop a novel preprocessing pipeline for our machine learning approach. By combining semi-supervised and supervised architectures, we are able to achieve an average F1-score of 81.1\% for dis-/agreement detection…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
