Artificial-Spiking Hierarchical Networks for Vision-Language   Representation Learning

Yeming Chen; Siyu Zhang; Yaoru Sun; Weijian Liang; Haoran Wang

arXiv:2308.09455·cs.CV·August 21, 2023

Artificial-Spiking Hierarchical Networks for Vision-Language Representation Learning

Yeming Chen, Siyu Zhang, Yaoru Sun, Weijian Liang, Haoran Wang

PDF

Open Access

TL;DR

This paper introduces ASH-Nets, a novel hierarchical model combining artificial and spiking neural networks to improve multimodal vision-language representations through semantic encoding and contrastive learning.

Contribution

The work presents a flexible hierarchical network integrating ANNs and SNNs, with novel semantic encoders and a pre-training method for enhanced vision-language task performance.

Findings

01

Achieves competitive results on multiple VL benchmarks.

02

Improves semantic encoding with discrete and continuous latent variables.

03

Enhances efficiency through contrastive learning and hard sample augmentation.

Abstract

With the success of self-supervised learning, multimodal foundation models have rapidly adapted a wide range of downstream tasks driven by vision and language (VL) pretraining. State-of-the-art methods achieve impressive performance by pre-training on large-scale datasets. However, bridging the semantic gap between the two modalities remains a nonnegligible challenge for VL tasks. In this work, we propose an efficient computation framework for multimodal alignment by introducing a novel visual semantic module to further improve the performance of the VL tasks. Specifically, we propose a flexible model, namely Artificial-Spiking Hierarchical Networks (ASH-Nets), which combines the complementary advantages of Artificial neural networks (ANNs) and Spiking neural networks (SNNs) to enrich visual semantic representations. In particular, a visual concrete encoder and a semantic abstract…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Genomics and Phylogenetic Studies

MethodsContrastive Learning