Beyond Uncertainty: Evidential Deep Learning for Robust Video Temporal   Grounding

Kaijing Ma; Haojian Huang; Jin Chen; Haodong Chen; Pengliang Ji,; Xianghao Zang; Han Fang; Chao Ban; Hao Sun; Mulin Chen; Xuelong Li

arXiv:2408.16272·cs.CV·August 30, 2024

Beyond Uncertainty: Evidential Deep Learning for Robust Video Temporal Grounding

Kaijing Ma, Haojian Huang, Jin Chen, Haodong Chen, Pengliang Ji,, Xianghao Zang, Han Fang, Chao Ban, Hao Sun, Mulin Chen, Xuelong Li

PDF

Open Access 1 Repo

TL;DR

This paper introduces SRAM, a novel network module for Video Temporal Grounding that incorporates Deep Evidential Regression to explicitly quantify uncertainty, improving robustness and interpretability in open-world scenarios.

Contribution

It presents the first successful application of Deep Evidential Regression in Video Temporal Grounding, with a new Geom-regularizer to enhance uncertainty estimation.

Findings

01

Improved robustness to noisy and out-of-distribution data.

02

Enhanced interpretability through explicit uncertainty quantification.

03

Demonstrated effectiveness on extensive experiments.

Abstract

Existing Video Temporal Grounding (VTG) models excel in accuracy but often overlook open-world challenges posed by open-vocabulary queries and untrimmed videos. This leads to unreliable predictions for noisy, corrupted, and out-of-distribution data. Adapting VTG models to dynamically estimate uncertainties based on user input can address this issue. To this end, we introduce SRAM, a robust network module that benefits from a two-stage cross-modal alignment task. More importantly, it integrates Deep Evidential Regression (DER) to explicitly and thoroughly quantify uncertainty during training, thus allowing the model to say "I do not know" in scenarios beyond its handling capacity. However, the direct application of traditional DER theory and its regularizer reveals structural flaws, leading to unintended constraints in VTG tasks. In response, we develop a simple yet effective…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

KaijingOfficial/sram_vtg
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Multimodal Machine Learning Applications · Video Analysis and Summarization