# Synthesizing Images from Spatio-Temporal Representations using   Spike-based Backpropagation

**Authors:** Deboleena Roy, Priyadarshini Panda, and Kaushik Roy

arXiv: 1906.08861 · 2019-06-24

## TL;DR

This paper introduces a spike-based auto-encoder approach for synthesizing images from multi-modal spatio-temporal data, demonstrating effective cross-modal audio-to-image conversion with low reconstruction loss and competitive performance.

## Contribution

It presents a novel spike-based training algorithm for auto-encoders that enables image synthesis from audio and visual data, advancing neuromorphic computing applications.

## Key findings

- Achieves low reconstruction loss on MNIST and Fashion-MNIST datasets.
- Successfully synthesizes high-fidelity images from audio inputs.
- Competitive performance in audio-to-image synthesis compared to traditional neural networks.

## Abstract

Spiking neural networks (SNNs) offer a promising alternative to current artificial neural networks to enable low-power event-driven neuromorphic hardware. Spike-based neuromorphic applications require processing and extracting meaningful information from spatio-temporal data, represented as series of spike trains over time. In this paper, we propose a method to synthesize images from multiple modalities in a spike-based environment. We use spiking auto-encoders to convert image and audio inputs into compact spatio-temporal representations that is then decoded for image synthesis. For this, we use a direct training algorithm that computes loss on the membrane potential of the output layer and back-propagates it by using a sigmoid approximation of the neuron's activation function to enable differentiability. The spiking autoencoders are benchmarked on MNIST and Fashion-MNIST and achieve very low reconstruction loss, comparable to ANNs. Then, spiking autoencoders are trained to learn meaningful spatio-temporal representations of the data, across the two modalities - audio and visual. We synthesize images from audio in a spike-based environment by first generating, and then utilizing such shared multi-modal spatio-temporal representations. Our audio to image synthesis model is tested on the task of converting TI-46 digits audio samples to MNIST images. We are able to synthesize images with high fidelity and the model achieves competitive performance against ANNs.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1906.08861/full.md

## Figures

10 figures with captions in the complete paper: https://tomesphere.com/paper/1906.08861/full.md

## References

32 references — full list in the complete paper: https://tomesphere.com/paper/1906.08861/full.md

---
Source: https://tomesphere.com/paper/1906.08861