TL;DR
This paper introduces a novel relational loss and a two-stage attention architecture to improve human affect and mental state estimation from videos, addressing challenges like limited data and poor temporal resolution.
Contribution
The work proposes a new relational loss for better generalisation and a two-stage attention model leveraging temporal context, advancing affect estimation methods.
Findings
Outperforms all baselines in affect and schizophrenia severity estimation.
Achieves up to 78% PCC in schizophrenia, close to human experts.
Improves CCC on affect datasets, surpassing previous state-of-the-art results.
Abstract
Human affect and mental state estimation in an automated manner, face a number of difficulties, including learning from labels with poor or no temporal resolution, learning from few datasets with little data (often due to confidentiality constraints) and, (very) long, in-the-wild videos. For these reasons, deep learning methodologies tend to overfit, that is, arrive at latent representations with poor generalisation performance on the final regression task. To overcome this, in this work, we introduce two complementary contributions. First, we introduce a novel relational loss for multilabel regression and ordinal problems that regularises learning and leads to better generalisation. The proposed loss uses label vector inter-relational information to learn better latent representations by aligning batch label distances to the distances in the latent feature space. Second, we utilise a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
