Self-supervised Learning for Semi-supervised Temporal Language Grounding

Fan Luo; Shaoxiang Chen; Jingjing Chen; Zuxuan Wu; Yu-Gang Jiang

arXiv:2109.11475·cs.CV·December 7, 2021·1 cites

Self-supervised Learning for Semi-supervised Temporal Language Grounding

Fan Luo, Shaoxiang Chen, Jingjing Chen, Zuxuan Wu, Yu-Gang Jiang

PDF

Open Access

TL;DR

This paper introduces S^4TLG, a semi-supervised approach for temporal language grounding that leverages self-supervised learning and pseudo labels to reduce annotation costs while maintaining high performance.

Contribution

It proposes a novel semi-supervised framework combining pseudo label generation and contrastive self-supervised learning for TLG.

Findings

01

Achieves competitive results with limited annotations

02

Effective pseudo label generation from teacher models

03

Improves feature representations via contrastive learning

Abstract

Given a text description, Temporal Language Grounding (TLG) aims to localize temporal boundaries of the segments that contain the specified semantics in an untrimmed video. TLG is inherently a challenging task, as it requires comprehensive understanding of both sentence semantics and video contents. Previous works either tackle this task in a fully-supervised setting that requires a large amount of temporal annotations or in a weakly-supervised setting that usually cannot achieve satisfactory performance. Since manual annotations are expensive, to cope with limited annotations, we tackle TLG in a semi-supervised way by incorporating self-supervised learning, and propose Self-Supervised Semi-Supervised Temporal Language Grounding (S^4TLG). S^4TLG consists of two parts: (1) A pseudo label generation module that adaptively produces instant pseudo labels for unlabeled samples based on…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Human Pose and Action Recognition · Video Analysis and Summarization