Improving vision-language alignment with graph spiking hybrid Networks
Siyu Zhang, Wenzhe Liu, Yeming Chen, Yiming Wu, Heming Zheng, Cheng, Cheng

TL;DR
This paper introduces a novel graph spiking hybrid network that leverages panoptic segmentation and contrastive learning to improve vision-language alignment by capturing rich semantic relations and contextual features.
Contribution
It proposes a new GSHN model combining SNNs and GATs, utilizing panoptic segmentation and a novel pre-training method to enhance semantic representation in VL tasks.
Findings
GSHN outperforms existing models on multiple VL benchmarks.
The use of contrastive learning improves embedding similarity and model robustness.
Panoptic segmentation enhances the quality of visual semantic features.
Abstract
To bridge the semantic gap between vision and language (VL), it is necessary to develop a good alignment strategy, which includes handling semantic diversity, abstract representation of visual information, and generalization ability of models. Recent works use detector-based bounding boxes or patches with regular partitions to represent visual semantics. While current paradigms have made strides, they are still insufficient for fully capturing the nuanced contextual relations among various objects. This paper proposes a comprehensive visual semantic representation module, necessitating the utilization of panoptic segmentation to generate coherent fine-grained semantic features. Furthermore, we propose a novel Graph Spiking Hybrid Network (GSHN) that integrates the complementary advantages of Spiking Neural Networks (SNNs) and Graph Attention Networks (GATs) to encode visual semantic…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Graph Neural Networks · Multimodal Machine Learning Applications · Robotics and Automated Systems
MethodsSoftmax · Attention Is All You Need · Contrastive Learning
