Exploring Visual Embedding Spaces Induced by Vision Transformers for   Online Auto Parts Marketplaces

Cameron Armijo; Pablo Rivas

arXiv:2502.05756·cs.CV·February 11, 2025

Exploring Visual Embedding Spaces Induced by Vision Transformers for Online Auto Parts Marketplaces

Cameron Armijo, Pablo Rivas

PDF

Open Access

TL;DR

This paper investigates the use of Vision Transformers to generate and analyze visual embeddings of auto parts from online marketplaces, aiming to detect illicit activities through pattern recognition in image data.

Contribution

It demonstrates the application of ViT-based embeddings combined with dimensionality reduction and clustering to analyze visual patterns in online auto parts listings, highlighting both strengths and limitations.

Findings

01

ViT effectively isolates visual patterns in auto parts images.

02

Clustering reveals meaningful groupings but faces challenges with overlaps.

03

Single-modal approach has limitations in complex marketplace data.

Abstract

This study examines the capabilities of the Vision Transformer (ViT) model in generating visual embeddings for images of auto parts sourced from online marketplaces, such as Craigslist and OfferUp. By focusing exclusively on single-modality data, the analysis evaluates ViT's potential for detecting patterns indicative of illicit activities. The workflow involves extracting high-dimensional embeddings from images, applying dimensionality reduction techniques like Uniform Manifold Approximation and Projection (UMAP) to visualize the embedding space, and using K-Means clustering to categorize similar items. Representative posts nearest to each cluster centroid provide insights into the composition and characteristics of the clusters. While the results highlight the strengths of ViT in isolating visual patterns, challenges such as overlapping clusters and outliers underscore the limitations…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsImage and Video Quality Assessment · Visual Attention and Saliency Detection