A Bimodal Learning Approach to Assist Multi-sensory Effects   Synchronization

Raphael Abreu; Joel dos Santos; Eduardo Bezerra

arXiv:1804.10822·cs.AI·May 1, 2018

A Bimodal Learning Approach to Assist Multi-sensory Effects Synchronization

Raphael Abreu, Joel dos Santos, Eduardo Bezerra

PDF

1 Repo

TL;DR

This paper introduces a bimodal neural network that uses audio and video signals to improve the synchronization of sensory effects in mulsemedia applications, enhancing timing accuracy and reducing manual effort.

Contribution

The paper presents a novel bimodal neural network architecture that leverages both audio and video data to assist in synchronizing sensory effects in mulsemedia applications, outperforming unimodal methods.

Findings

01

Bimodal approach yields better synchronization accuracy than unimodal methods.

02

The model trained on Google's AudioSet demonstrates effective scene component prediction.

03

Experimental results confirm the superiority of combined audio-video signals for timing synchronization.

Abstract

In mulsemedia applications, traditional media content (text, image, audio, video, etc.) can be related to media objects that target other human senses (e.g., smell, haptics, taste). Such applications aim at bridging the virtual and real worlds through sensors and actuators. Actuators are responsible for the execution of sensory effects (e.g., wind, heat, light), which produce sensory stimulations on the users. In these applications sensory stimulation must happen in a timely manner regarding the other traditional media content being presented. For example, at the moment in which an explosion is presented in the audiovisual content, it may be adequate to activate actuators that produce heat and light. It is common to use some declarative multimedia authoring language to relate the timestamp in which each media object is to be presented to the execution of some sensory effect. One problem…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

MLRG-CEFET-RJ/bimodal_audioset
tfOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.