Nomic Embed Vision: Expanding the Latent Space

Zach Nussbaum; Brandon Duderstadt; Andriy Mulyar

arXiv:2406.18587·cs.CV·June 28, 2024·1 cites

Nomic Embed Vision: Expanding the Latent Space

Zach Nussbaum, Brandon Duderstadt, Andriy Mulyar

PDF

Open Access 1 Repo 1 Models

TL;DR

This paper introduces nomic-embed-vision, an image embedding model that shares a unified latent space with text embeddings, enabling high performance across vision, language, and multimodal tasks.

Contribution

It presents the first unified latent space for vision and language embeddings, combining image and text models for improved multimodal performance.

Findings

01

Shared latent space enhances multimodal task performance

02

Open-source model facilitates research and application development

03

Achieves high accuracy in vision and language tasks

Abstract

This technical report describes the training of nomic-embed-vision, a highly performant, open-code, open-weights image embedding model that shares the same latent space as nomic-embed-text. Together, nomic-embed-vision and nomic-embed-text form the first unified latent space to achieve high performance across vision, language, and multimodal tasks.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

nomic-ai/contrastors
pytorch

Models

🤗
nomic-ai/nomic-embed-vision-v1.5
model· 538k dl· ♡ 215
538k dl♡ 215

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsLanguage, Discourse, Communication Strategies