Static and Dynamic Graph Alignment Network for Temporal Video Grounding

Zhanjie Hu; Bolin Zhang; Jianhua Wang; Jianbo Zheng; Chenchen Yan; Takahiro Komamizu; Ichiro Ide; Jiangbo Qian

arXiv:2605.00684·cs.CV·May 4, 2026

Static and Dynamic Graph Alignment Network for Temporal Video Grounding

Zhanjie Hu, Bolin Zhang, Jianhua Wang, Jianbo Zheng, Chenchen Yan, Takahiro Komamizu, Ichiro Ide, Jiangbo Qian

PDF

1 Repo

TL;DR

The paper introduces SDGAN, a novel graph-based model for temporal video grounding that combines static and dynamic features, query-aware alignment, and multi-granularity proposals to improve localization accuracy.

Contribution

SDGAN is the first to jointly exploit static and dynamic features, perform query-aware alignment, and incorporate multi-granularity proposals with progressive training for TVG.

Findings

01

SDGAN outperforms existing methods on three benchmark datasets.

02

Joint static and dynamic features enhance visual representation.

03

Query-clip contrastive learning improves query-aware localization.

Abstract

Temporal Video Grounding (TVG) aims to localize temporal moments in an untrimmed video that semantically correspond to given natural language queries. Recently, Graph Convolutional Networks (GCN) have been widely adopted in TVG to model temporal relations among video clips and enhance contextual reasoning by constructing clip-level graphs. Despite their effectiveness, existing GCN-based TVG methods encounter three critical bottlenecks: 1) Most methods construct graph nodes using either static or dynamic features alone, resulting in incomplete visual representation and overlooking complementary semantics, 2) Most methods construct temporal graphs in a query-agnostic manner, leading to inefficient feature interaction within the temporal graph representation, and 3) Most methods often suffer from a single-granularity semantic matching, while direct training on complex temporal localization…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ZhanJieHu/SDGAN
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.