Uncertainty-driven Embedding Convolution

Sungjun Lim; Kangjun Noh; Youngjun Choi; Heeyoung Lee; and Kyungwoo Song

arXiv:2507.20718·cs.LG·February 13, 2026

Uncertainty-driven Embedding Convolution

Sungjun Lim, Kangjun Noh, Youngjun Choi, Heeyoung Lee, and Kyungwoo Song

PDF

3 Reviews

TL;DR

This paper introduces Uncertainty-driven Embedding Convolution (UEC), a novel ensemble method that transforms deterministic text embeddings into probabilistic ones, using uncertainty to improve NLP task performance and robustness.

Contribution

UEC is the first approach to incorporate model-specific uncertainty into ensemble coefficients and similarity scoring for text embeddings, enhancing robustness and accuracy.

Findings

01

UEC outperforms existing ensemble methods across multiple benchmarks.

02

Incorporating uncertainty improves robustness against domain shifts.

03

Probabilistic embeddings lead to better similarity measures.

Abstract

Text embeddings are essential components in modern NLP pipelines. Although numerous embedding models have been proposed, no single model consistently dominates across domains and tasks. This variability motivates the use of ensemble techniques to combine complementary strengths. However, most existing ensemble methods operate on deterministic embeddings and fail to account for model-specific uncertainty, limiting their robustness and reliability in downstream applications. To address these limitations, we propose Uncertainty-driven Embedding Convolution (UEC). UEC first transforms deterministic embeddings into probabilistic ones in a post-hoc manner. It then computes adaptive ensemble coefficients based on embedding uncertainty, derived from a principled surrogate-loss formulation. Additionally, UEC employs an uncertainty-aware similarity function that directly incorporates uncertainty…

Peer Reviews

Decision·ICLR 2026 Poster

Reviewer 01Rating 2Confidence 4

Strengths

- This work provides a way to ensemble embeddings in an uncertainty aware manner.

Weaknesses

- It seems like this work contains a lot of jargon without providing any insights on what the proposed method is and how does it fit into the existing literature. - This work has major theoretical flaws.

Reviewer 02Rating 4Confidence 2

Strengths

### Strengths 1. **Principled probabilistic foundation** — Correct derivation of Gaussian embeddings from Laplace posteriors. 2. **Theoretical rigor** — Ranking-preserving link between probit similarity, Wasserstein-2, and Jeffreys divergence. 3. **Practicality** — Post-hoc, model-agnostic, and nearly cost-free in computation. 4. **Empirical consistency** — Strong performance across MMTEB tasks and clear ablations. 5. **Trustworthy AI alignment** — Explicit uncertainty propagation enhances inte

Weaknesses

### Weaknesses 1. **Limited type of uncertainty.** The method models only **epistemic uncertainty** (parameter uncertainty via Laplace Approximation on the last layer). It does not capture **aleatoric** or **predictive uncertainty**, nor propagate epistemic uncertainty through deeper layers. 2. **Strong independence and diagonal assumptions.** The Gaussian convolution assumes independent embeddings and diagonal covariances. These assumptions are rarely valid for transformer-based e

Reviewer 03Rating 4Confidence 3

Strengths

1. The method is conceptually clean and well grounded from existing literature. 2. The paper is clearly written and mostly mathematically correct, although with some times assumptions not specifically defined. 3. The empirical results are convincing, the authors covering retrieval, classification, and semantic textual similarity. 4. The approach is practical, post-hoc, scalable, and effective without model retraining.

Weaknesses

1. The core components are standard statistical techniques: Laplace approximation inverse-variance weighting for Gaussian combination, and uncertainty-aware similarity based on probit approximations. Hence, I think the paper’s novelty lies mostly in assembling these ingredients into a single coherent pipeline for embedding ensembles, not in introducing fundamentally new theory. 2. The “Bayes-optimal” claim is slightly overstated given that the surrogate loss drops query-dependent terms (Eq. 3).

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.