SynopGround: A Large-Scale Dataset for Multi-Paragraph Video Grounding   from TV Dramas and Synopses

Chaolei Tan; Zihang Lin; Junfu Pu; Zhongang Qi; Wei-Yi Pei; Zhi Qu,; Yexin Wang; Ying Shan; Wei-Shi Zheng; Jian-Fang Hu

arXiv:2408.01669·cs.CV·August 20, 2024

SynopGround: A Large-Scale Dataset for Multi-Paragraph Video Grounding from TV Dramas and Synopses

Chaolei Tan, Zihang Lin, Junfu Pu, Zhongang Qi, Wei-Yi Pei, Zhi Qu,, Yexin Wang, Ying Shan, Wei-Shi Zheng, Jian-Fang Hu

PDF

TL;DR

SynopGround introduces a large-scale, detailed dataset from TV dramas for multi-paragraph video grounding, enabling models to understand complex storylines and long-term dependencies in videos.

Contribution

The paper presents SynopGround, a novel large-scale dataset with detailed annotations for multi-paragraph video grounding, and proposes LGMR, a new model for better long-term multimodal reasoning.

Findings

01

LGMR outperforms previous methods in long-term video grounding tasks.

02

SynopGround enables learning of complex storylines and abstract expressions.

03

The dataset supports research on intricate multimodal understanding.

Abstract

Video grounding is a fundamental problem in multimodal content understanding, aiming to localize specific natural language queries in an untrimmed video. However, current video grounding datasets merely focus on simple events and are either limited to shorter videos or brief sentences, which hinders the model from evolving toward stronger multimodal understanding capabilities. To address these limitations, we present a large-scale video grounding dataset named SynopGround, in which more than 2800 hours of videos are sourced from popular TV dramas and are paired with accurately localized human-written synopses. Each paragraph in the synopsis serves as a language query and is manually annotated with precise temporal boundaries in the long video. These paragraph queries are tightly correlated to each other and contain a wealth of abstract expressions summarizing video storylines and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsFocus