When Kernels Multiply, Clusters Unify: Fusing Embeddings with the Kronecker Product

Youqi Wu; Jingwei Zhang; Farzan Farnia

arXiv:2506.08645·cs.LG·October 31, 2025

When Kernels Multiply, Clusters Unify: Fusing Embeddings with the Kronecker Product

Youqi Wu, Jingwei Zhang, Farzan Farnia

PDF

1 Video

TL;DR

This paper introduces KrossFuse, a novel kernel fusion method using Kronecker products to combine embeddings from different models, improving multi-modal data representation and bridging the gap between cross-modal and unimodal embeddings.

Contribution

The paper proposes a principled kernel multiplication approach for embedding fusion, along with a scalable approximation method, enhancing multi-modal and unimodal embedding integration.

Findings

01

RP-KrossFuse effectively combines models, improving performance.

02

Fusion preserves cross-modal alignment while enhancing modality-specific accuracy.

03

The approach bridges the gap between cross-modal and unimodal embeddings.

Abstract

State-of-the-art embeddings often capture distinct yet complementary discriminative features: For instance, one image embedding model may excel at distinguishing fine-grained textures, while another focuses on object-level structure. Motivated by this observation, we propose a principled approach to fuse such complementary representations through kernel multiplication. Multiplying the kernel similarity functions of two embeddings allows their discriminative structures to interact, producing a fused representation whose kernel encodes the union of the clusters identified by each parent embedding. This formulation also provides a natural way to construct joint kernels for paired multi-modal data (e.g., image-text tuples), where the product of modality-specific kernels inherits structure from both domains. We highlight that this kernel product is mathematically realized via the Kronecker…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

When Kernels Multiply, Clusters Unify: Fusing Embeddings with the Kronecker Product· slideslive

Taxonomy

MethodsFocus · Contrastive Language-Image Pre-training · BLIP: Bootstrapping Language-Image Pre-training