Vision Large Language Models Are Good Noise Handlers in Engagement Analysis

Alexander Vedernikov; Puneet Kumar; Haoyu Chen; Tapio Sepp\"anen; Xiaobai Li

arXiv:2511.14749·cs.CV·November 19, 2025

Vision Large Language Models Are Good Noise Handlers in Engagement Analysis

Alexander Vedernikov, Puneet Kumar, Haoyu Chen, Tapio Sepp\"anen, Xiaobai Li

PDF

Open Access

TL;DR

This paper introduces a framework using Vision Large Language Models to refine noisy engagement annotations in videos, improving model training and surpassing state-of-the-art benchmarks.

Contribution

It proposes a novel annotation refinement and training strategy leveraging VLMs, curriculum learning, and soft labels to handle subjective noise in engagement datasets.

Findings

01

Improved engagement recognition performance on benchmarks.

02

Enhanced model robustness with refined annotations.

03

Surpassed previous state-of-the-art results.

Abstract

Engagement recognition in video datasets, unlike traditional image classification tasks, is particularly challenged by subjective labels and noise limiting model performance. To overcome the challenges of subjective and noisy engagement labels, we propose a framework leveraging Vision Large Language Models (VLMs) to refine annotations and guide the training process. Our framework uses a questionnaire to extract behavioral cues and split data into high- and low-reliability subsets. We also introduce a training strategy combining curriculum learning with soft label refinement, gradually incorporating ambiguous samples while adjusting supervision to reflect uncertainty. We demonstrate that classical computer vision models trained on refined high-reliability subsets and enhanced with our curriculum strategy show improvements, highlighting benefits of addressing label subjectivity with VLMs.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsEmotion and Mood Recognition · Machine Learning and Data Classification · Explainable Artificial Intelligence (XAI)