Adaptive Evidential Learning for Temporal-Semantic Robustness in Moment Retrieval

Haojian Huang; Kaijing Ma; Jin Chen; Haodong Chen; Zhou Wu; Xianghao Zang; Han Fang; Chao Ban; Hao Sun; Mulin Chen; Zhongjiang He

arXiv:2512.00953·cs.CV·December 2, 2025

Adaptive Evidential Learning for Temporal-Semantic Robustness in Moment Retrieval

Haojian Huang, Kaijing Ma, Jin Chen, Haodong Chen, Zhou Wu, Xianghao Zang, Han Fang, Chao Ban, Hao Sun, Mulin Chen, Zhongjiang He

PDF

Open Access

TL;DR

This paper introduces DEMR, a novel framework that improves temporal-semantic robustness in video moment retrieval by addressing modality imbalance and uncertainty estimation issues through cross-modal alignment, query reconstruction, and a Geom-regularizer.

Contribution

The paper proposes DEMR, a new method with a Reflective Flipped Fusion block, query reconstruction, and Geom-regularizer to enhance robustness and uncertainty estimation in moment retrieval.

Findings

01

Significant improvements on ActivityNet-CD and Charades-CD datasets.

02

Enhanced robustness and interpretability in moment retrieval.

03

Better uncertainty calibration and modality alignment.

Abstract

In the domain of moment retrieval, accurately identifying temporal segments within videos based on natural language queries remains challenging. Traditional methods often employ pre-trained models that struggle with fine-grained information and deterministic reasoning, leading to difficulties in aligning with complex or ambiguous moments. To overcome these limitations, we explore Deep Evidential Regression (DER) to construct a vanilla Evidential baseline. However, this approach encounters two major issues: the inability to effectively handle modality imbalance and the structural differences in DER's heuristic uncertainty regularizer, which adversely affect uncertainty estimation. This misalignment results in high uncertainty being incorrectly associated with accurate samples rather than challenging ones. Our observations indicate that existing methods lack the adaptability required for…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Human Pose and Action Recognition · Advanced Image and Video Retrieval Techniques