Live Video Captioning

Eduardo Blanco-Fern\'andez; Carlos Guti\'errez-\'Alvarez; Nadia Nasri; Saturnino Maldonado-Basc\'on; Roberto J. L\'opez-Sastre

arXiv:2406.14206·cs.CV·May 27, 2025

Live Video Captioning

Eduardo Blanco-Fern\'andez, Carlos Guti\'errez-\'Alvarez, Nadia Nasri, Saturnino Maldonado-Basc\'on, Roberto J. L\'opez-Sastre

PDF

1 Repo

TL;DR

This paper introduces Live Video Captioning, a new online task for generating captions for streaming videos, along with a novel model, evaluation metrics, and extensive experiments demonstrating its effectiveness over traditional offline methods.

Contribution

The paper formally defines the Live Video Captioning problem, proposes innovative evaluation metrics, and develops a deformable transformer-based model for real-time captioning of video streams.

Findings

01

The proposed model outperforms state-of-the-art offline methods in live captioning tasks.

02

New evaluation metrics better capture the performance of online captioning systems.

03

Extensive experiments validate the effectiveness of the proposed approach.

Abstract

Dense video captioning involves detecting and describing events within video sequences. Traditional methods operate in an offline setting, assuming the entire video is available for analysis. In contrast, in this work we introduce a groundbreaking paradigm: Live Video Captioning (LVC), where captions must be generated for video streams in an online manner. This shift brings unique challenges, including processing partial observations of the events and the need for a temporal anticipation of the actions. We formally define the novel problem of LVC and propose innovative evaluation metrics specifically designed for this online scenario, demonstrating their advantages over traditional metrics. To address the novel complexities of LVC, we present a new model that combines deformable transformers with temporal filtering, enabling effective captioning over video streams. Extensive experiments…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

gramuah/lvc
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsFocus