Visual and audio scene classification for detecting discrepancies in   video: a baseline method and experimental protocol

Konstantinos Apostolidis; Jakob Abesser; Luca Cuccovillo; Vasileios; Mezaris

arXiv:2405.00384·cs.CV·May 2, 2024

Visual and audio scene classification for detecting discrepancies in video: a baseline method and experimental protocol

Konstantinos Apostolidis, Jakob Abesser, Luca Cuccovillo, Vasileios, Mezaris

PDF

Open Access 1 Repo

TL;DR

This paper introduces a baseline method and protocol for detecting audio-visual discrepancies in videos, utilizing a scene classifier to improve content verification and establish a standard evaluation framework.

Contribution

It presents a novel audio-visual scene classifier and an experimental protocol with a benchmark dataset for detecting inconsistencies between audio and video content.

Findings

01

Achieved state-of-the-art scene classification accuracy

02

Demonstrated promising results in detecting audio-visual discrepancies

03

Provided a new benchmark dataset and evaluation protocol

Abstract

This paper presents a baseline approach and an experimental protocol for a specific content verification problem: detecting discrepancies between the audio and video modalities in multimedia content. We first design and optimize an audio-visual scene classifier, to compare with existing classification baselines that use both modalities. Then, by applying this classifier separately to the audio and the visual modality, we can detect scene-class inconsistencies between them. To facilitate further research and provide a common evaluation platform, we introduce an experimental protocol and a benchmark dataset simulating such inconsistencies. Our approach achieves state-of-the-art results in scene classification and promising outcomes in audio-visual discrepancies detection, highlighting its potential in content verification applications.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

idt-iti/visual-audio-discrepancy-detection
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDigital Media Forensic Detection