Features extraction for image identification using computer vision
Venant Niyonkuru, Sylla Sekou, Jimmy Jackson Sinzinkayo

TL;DR
This paper compares various feature extraction techniques in computer vision, highlighting the superior performance of Vision Transformers over traditional CNNs and analyzing their architectures and applications.
Contribution
It provides a comprehensive comparison of feature extraction methods, emphasizing the architecture and advantages of Vision Transformers in computer vision tasks.
Findings
Vision Transformers outperform CNNs in feature extraction.
Traditional methods like SIFT and SURF are still relevant for certain applications.
Experimental results highlight the strengths and limitations of each approach.
Abstract
This study examines various feature extraction techniques in computer vision, the primary focus of which is on Vision Transformers (ViTs) and other approaches such as Generative Adversarial Networks (GANs), deep feature models, traditional approaches (SIFT, SURF, ORB), and non-contrastive and contrastive feature models. Emphasizing ViTs, the report summarizes their architecture, including patch embedding, positional encoding, and multi-head self-attention mechanisms with which they overperform conventional convolutional neural networks (CNNs). Experimental results determine the merits and limitations of both methods and their utilitarian applications in advancing computer vision.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
