Tragic Talkers: A Shakespearean Sound- and Light-Field Dataset for   Audio-Visual Machine Learning Research

Davide Berghi; Marco Volino; Philip J. B. Jackson

arXiv:2212.01892·eess.AS·December 6, 2022

Tragic Talkers: A Shakespearean Sound- and Light-Field Dataset for Audio-Visual Machine Learning Research

Davide Berghi, Marco Volino, Philip J. B. Jackson

PDF

TL;DR

The paper introduces 'Tragic Talkers', a comprehensive 3D audio-visual dataset with synchronized multi-view video, spatial audio, and detailed annotations, aimed at advancing immersive audio-visual machine learning research.

Contribution

It provides a novel, high-quality dataset combining light-field video and spatial audio with extensive annotations for diverse talking scenarios, filling a critical gap in existing resources.

Findings

01

Dataset includes 30 sequences from 22 viewpoints and two microphone arrays.

02

Provides detailed annotations like face bounding boxes, pose keypoints, and dialogue transcriptions.

03

Enables research in object-based media, spatial audio, and multi-view analysis.

Abstract

3D audio-visual production aims to deliver immersive and interactive experiences to the consumer. Yet, faithfully reproducing real-world 3D scenes remains a challenging task. This is partly due to the lack of available datasets enabling audio-visual research in this direction. In most of the existing multi-view datasets, the accompanying audio is neglected. Similarly, datasets for spatial audio research primarily offer unimodal content, and when visual data is included, the quality is far from meeting the standard production needs. We present "Tragic Talkers", an audio-visual dataset consisting of excerpts from the "Romeo and Juliet" drama captured with microphone arrays and multiple co-located cameras for light-field video. Tragic Talkers provides ideal content for object-based media (OBM) production. It is designed to cover various conventional talking scenarios, such as monologues,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.