Test-Time Adaptation for Video Highlight Detection Using Meta-Auxiliary Learning and Cross-Modality Hallucinations

Zahidul Islam; Sujoy Paul; Mrigank Rochan

arXiv:2508.04924·cs.CV·August 8, 2025

Test-Time Adaptation for Video Highlight Detection Using Meta-Auxiliary Learning and Cross-Modality Hallucinations

Zahidul Islam, Sujoy Paul, Mrigank Rochan

PDF

TL;DR

This paper introduces Highlight-TTA, a novel test-time adaptation framework for video highlight detection that dynamically adjusts models during testing using a meta-auxiliary learning scheme and cross-modality hallucinations, significantly improving generalization and performance.

Contribution

It proposes a new test-time adaptation method with meta-auxiliary learning and cross-modality hallucinations to enhance video highlight detection on unseen videos.

Findings

01

Improves highlight detection performance across multiple models and datasets.

02

Enhances model generalization to diverse and unseen test videos.

03

Achieves superior results compared to baseline methods.

Abstract

Existing video highlight detection methods, although advanced, struggle to generalize well to all test videos. These methods typically employ a generic highlight detection model for each test video, which is suboptimal as it fails to account for the unique characteristics and variations of individual test videos. Such fixed models do not adapt to the diverse content, styles, or audio and visual qualities present in new, unseen test videos, leading to reduced highlight detection performance. In this paper, we propose Highlight-TTA, a test-time adaptation framework for video highlight detection that addresses this limitation by dynamically adapting the model during testing to better align with the specific characteristics of each test video, thereby improving generalization and highlight detection performance. Highlight-TTA is jointly optimized with an auxiliary task, cross-modality…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.