Multimodal Structure-Aware Quantum Data Processing
Hala Hawashin, Mehrnoosh Sadrzadeh

TL;DR
This paper introduces MultiQ-NLP, a quantum computing framework for processing multimodal text and image data that captures linguistic and visual structures, achieving competitive results with classical models.
Contribution
It develops a novel quantum-based architecture for structure-aware multimodal data processing, integrating syntactic and visual hierarchies.
Findings
Achieved parity with state-of-the-art classical models on image classification.
Developed fully structured quantum models for multimodal data.
Enhanced translation with new types and homomorphisms.
Abstract
While large language models (LLMs) have advanced the field of natural language processing (NLP), their "black box" nature obscures their decision-making processes. To address this, researchers developed structured approaches using higher order tensors. These are able to model linguistic relations, but stall when training on classical computers due to their excessive size. Tensors are natural inhabitants of quantum systems and training on quantum computers provides a solution by translating text to variational quantum circuits. In this paper, we develop MultiQ-NLP: a framework for structure-aware data processing with multimodal text+image data. Here, "structure" refers to syntactic and grammatical relationships in language, as well as the hierarchical organization of visual elements in images. We enrich the translation with new types and type homomorphisms and develop novel architectures…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsQuantum Computing Algorithms and Architecture · Neural Networks and Reservoir Computing · Fractal and DNA sequence analysis
