Features extraction for image identification using computer vision

Venant Niyonkuru; Sylla Sekou; Jimmy Jackson Sinzinkayo

arXiv:2507.18650·cs.CV·July 28, 2025

Features extraction for image identification using computer vision

Venant Niyonkuru, Sylla Sekou, Jimmy Jackson Sinzinkayo

PDF

TL;DR

This paper compares various feature extraction techniques in computer vision, highlighting the superior performance of Vision Transformers over traditional CNNs and analyzing their architectures and applications.

Contribution

It provides a comprehensive comparison of feature extraction methods, emphasizing the architecture and advantages of Vision Transformers in computer vision tasks.

Findings

01

Vision Transformers outperform CNNs in feature extraction.

02

Traditional methods like SIFT and SURF are still relevant for certain applications.

03

Experimental results highlight the strengths and limitations of each approach.

Abstract

This study examines various feature extraction techniques in computer vision, the primary focus of which is on Vision Transformers (ViTs) and other approaches such as Generative Adversarial Networks (GANs), deep feature models, traditional approaches (SIFT, SURF, ORB), and non-contrastive and contrastive feature models. Emphasizing ViTs, the report summarizes their architecture, including patch embedding, positional encoding, and multi-head self-attention mechanisms with which they overperform conventional convolutional neural networks (CNNs). Experimental results determine the merits and limitations of both methods and their utilitarian applications in advancing computer vision.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.