Diversified Augmentation with Domain Adaptation for Debiased Video   Temporal Grounding

Junlong Ren; Gangjian Zhang; Haifeng Sun; Hao Wang

arXiv:2501.06746·cs.CV·January 15, 2025

Diversified Augmentation with Domain Adaptation for Debiased Video Temporal Grounding

Junlong Ren, Gangjian Zhang, Haifeng Sun, Hao Wang

PDF

Open Access

TL;DR

This paper introduces a novel training framework for temporal sentence grounding in videos that uses diversified data augmentation and domain adaptation to improve generalization and reduce bias, achieving state-of-the-art results.

Contribution

The paper proposes a new method combining diversified data augmentation with a domain discriminator to address temporal bias and improve model robustness in video grounding.

Findings

01

Achieves state-of-the-art performance on Charades-CD and ActivityNet-CD datasets.

02

Effectively reduces temporal bias and improves generalization across different grounding structures.

03

Demonstrates robustness in videos with varying lengths and target moment locations.

Abstract

Temporal sentence grounding in videos (TSGV) faces challenges due to public TSGV datasets containing significant temporal biases, which are attributed to the uneven temporal distributions of target moments. Existing methods generate augmented videos, where target moments are forced to have varying temporal locations. However, since the video lengths of the given datasets have small variations, only changing the temporal locations results in poor generalization ability in videos with varying lengths. In this paper, we propose a novel training framework complemented by diversified data augmentation and a domain discriminator. The data augmentation generates videos with various lengths and target moment locations to diversify temporal distributions. However, augmented videos inevitably exhibit distinct feature distributions which may introduce noise. To address this, we design a domain…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Human Pose and Action Recognition · Advanced Vision and Imaging