Multi-modal data generation with a deep metric variational autoencoder
Josefine Vilsb{\o}ll Sundgaard, Morten Rieger Hannemose, S{\o}ren, Laugesen, Peter Bray, James Harte, Yosuke Kamide, Chiemi Tanaka, Rasmus R., Paulsen, and Anders Nymark Christensen

TL;DR
This paper introduces a deep metric variational autoencoder that enables conditional multi-modal data generation, demonstrated on otoscopy images and tympanometry measurements, facilitating data augmentation for correlated but non-pixel-aligned modalities.
Contribution
The paper proposes a novel deep metric variational autoencoder with triplet loss for conditional multi-modal data generation, applied to medical data, enhancing data augmentation capabilities.
Findings
Effective conditional generation of image-tympanometry pairs
Promising results in multi-modal data augmentation
Potential for improved medical data synthesis
Abstract
We present a deep metric variational autoencoder for multi-modal data generation. The variational autoencoder employs triplet loss in the latent space, which allows for conditional data generation by sampling in the latent space within each class cluster. The approach is evaluated on a multi-modal dataset consisting of otoscopy images of the tympanic membrane with corresponding wideband tympanometry measurements. The modalities in this dataset are correlated, as they represent different aspects of the state of the middle ear, but they do not present a direct pixel-to-pixel correlation. The approach shows promising results for the conditional generation of pairs of images and tympanograms, and will allow for efficient data augmentation of data from multi-modal sources.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Underwater Acoustics Research · Nasal Surgery and Airway Studies
MethodsTriplet Loss
