Multimodal Chaptering for Long-Form TV Newscast Video

Khalil Guetari; Yannis Tevissen (ARMEDIA-SAMOVAR); Fr\'ed\'eric; Petitpont

arXiv:2406.17590·cs.MM·June 26, 2024

Multimodal Chaptering for Long-Form TV Newscast Video

Khalil Guetari, Yannis Tevissen (ARMEDIA-SAMOVAR), Fr\'ed\'eric, Petitpont

PDF

Open Access

TL;DR

This paper introduces a multimodal approach combining audio and visual cues with neural networks to automatically segment long TV newscast videos, improving organization and retrieval of broadcast content.

Contribution

The paper presents a novel two-stage neural network model that fuses audio-visual features for accurate video chaptering, achieving state-of-the-art performance on a large dataset.

Findings

01

Achieved 82% precision at IoU of 90%

02

Outperformed existing methods in segment boundary detection

03

Validated on over 500 diverse TV newscast videos

Abstract

We propose a novel approach for automatic chaptering of TV newscast videos, addressing the challenge of structuring and organizing large collections of unsegmented broadcast content. Our method integrates both audio and visual cues through a two-stage process involving frozen neural networks and a trained LSTM network. The first stage extracts essential features from separate modalities, while the LSTM effectively fuses these features to generate accurate segment boundaries. Our proposed model has been evaluated on a diverse dataset comprising over 500 TV newscast videos of an average of 41 minutes gathered from TF1, a French TV channel, with varying lengths and topics. Experimental results demonstrate that this innovative fusion strategy achieves state of the art performance, yielding a high precision rate of 82% at IoU of 90%. Consequently, this approach significantly enhances…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimedia Communication and Technology · Power Systems and Technologies