Exploiting Context Information for Generic Event Boundary Captioning

Jinrui Zhang; Teng Wang; Feng Zheng; Ran Cheng; Ping Luo

arXiv:2207.01050·cs.CV·July 5, 2022

Exploiting Context Information for Generic Event Boundary Captioning

Jinrui Zhang, Teng Wang, Feng Zheng, Ran Cheng, Ping Luo

PDF

Open Access 1 Repo

TL;DR

This paper introduces a novel model for generic event boundary captioning that leverages entire video context and boundary interactions to generate more accurate descriptions, achieving high performance and placing second in a challenge.

Contribution

It proposes a new approach that processes the whole video and models boundary interactions for improved captioning accuracy.

Findings

01

Achieved a 72.84 score on the test set.

02

Outperformed previous methods by utilizing context information.

03

Secured second place in the challenge.

Abstract

Generic Event Boundary Captioning (GEBC) aims to generate three sentences describing the status change for a given time boundary. Previous methods only process the information of a single boundary at a time, which lacks utilization of video context information. To tackle this issue, we design a model that directly takes the whole video as input and generates captions for all boundaries parallelly. The model could learn the context information for each time boundary by modeling the boundary-boundary interactions. Experiments demonstrate the effectiveness of context information. The proposed method achieved a 72.84 score on the test set, and we reached the $2^{n d}$ place in this challenge. Our code is available at: \url{https://github.com/zjr2000/Context-GEBC}

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

zjr2000/context-gebc
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Video Analysis and Summarization · Human Pose and Action Recognition

MethodsTest