IO Transformer: Evaluating SwinV2-Based Reward Models for Computer   Vision

Maxwell Meyer; Jack Spruyt

arXiv:2411.00252·cs.CV·November 4, 2024

IO Transformer: Evaluating SwinV2-Based Reward Models for Computer Vision

Maxwell Meyer, Jack Spruyt

PDF

Open Access

TL;DR

This paper introduces SwinV2-based reward models for computer vision, demonstrating their high accuracy in output quality evaluation and expanding transformer applications beyond traditional tasks.

Contribution

It presents novel SwinV2-based reward models for evaluating model outputs, showing their effectiveness across vision tasks and exploring architecture modifications.

Findings

01

IO Transformer achieves perfect accuracy on CD25

02

Swin V2 scores 95.41% on IO Segmentation Dataset

03

Swin V2 outperforms IO Transformer when output isn't solely input-dependent

Abstract

Transformers and their derivatives have achieved state-of-the-art performance across text, vision, and speech recognition tasks. However, minimal effort has been made to train transformers capable of evaluating the output quality of other models. This paper examines SwinV2-based reward models, called the Input-Output Transformer (IO Transformer) and the Output Transformer. These reward models can be leveraged for tasks such as inference quality evaluation, data categorization, and policy optimization. Our experiments demonstrate highly accurate model output quality assessment across domains where the output is entirely dependent on the input, with the IO Transformer achieving perfect evaluation accuracy on the Change Dataset 25 (CD25). We also explore modified Swin V2 architectures. Ultimately Swin V2 remains on top with a score of 95.41 % on the IO Segmentation Dataset, outperforming…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsInfrared Target Detection Methodologies · CCD and CMOS Imaging Sensors · Industrial Vision Systems and Defect Detection

MethodsAttention Is All You Need · Linear Layer · Layer Normalization · Position-Wise Feed-Forward Layer · Adam · Multi-Head Attention · Residual Connection · Byte Pair Encoding · Dropout · Absolute Position Encodings