YotoR-You Only Transform One Representation

Jos\'e Ignacio D\'iaz Villa; Patricio Loncomilla; Javier; Ruiz-del-Solar

arXiv:2405.19629·cs.CV·May 31, 2024·1 cites

YotoR-You Only Transform One Representation

Jos\'e Ignacio D\'iaz Villa, Patricio Loncomilla, Javier, Ruiz-del-Solar

PDF

Open Access

TL;DR

YotoR is a new deep learning model that combines Swin Transformers with YoloR architecture, achieving better accuracy and faster inference in object detection tasks.

Contribution

The paper introduces YotoR, a novel integration of Swin Transformers and YoloR, demonstrating improved performance over existing models in object detection.

Findings

01

YotoR models outperform YoloR P6 and Swin Transformers in accuracy.

02

YotoR achieves faster inference speeds than Swin Transformer models.

03

YotoR shows potential for enhancing real-time object detection with Transformers.

Abstract

This paper introduces YotoR (You Only Transform One Representation), a novel deep learning model for object detection that combines Swin Transformers and YoloR architectures. Transformers, a revolutionary technology in natural language processing, have also significantly impacted computer vision, offering the potential to enhance accuracy and computational efficiency. YotoR combines the robust Swin Transformer backbone with the YoloR neck and head. In our experiments, YotoR models TP5 and BP4 consistently outperform YoloR P6 and Swin Transformers in various evaluations, delivering improved object detection performance and faster inference speeds than Swin Transformer models. These results highlight the potential for further model combinations and improvements in real-time object detection with Transformers. The paper concludes by emphasizing the broader implications of YotoR, including…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques

MethodsAttention Is All You Need · Byte Pair Encoding · Label Smoothing · Adam · Position-Wise Feed-Forward Layer · Dropout · Dense Connections · Absolute Position Encodings · Softmax · Layer Normalization