Co-Scale Cross-Attentional Transformer for Rearrangement Target   Detection

Haruka Matsuo; Shintaro Ishikawa; Komei Sugiura

arXiv:2407.05063·cs.RO·July 9, 2024

Co-Scale Cross-Attentional Transformer for Rearrangement Target Detection

Haruka Matsuo, Shintaro Ishikawa, Komei Sugiura

PDF

Open Access 1 Repo

TL;DR

This paper introduces a novel Co-Scale Cross-Attentional Transformer model for Rearrangement Target Detection, effectively identifying objects to be rearranged in complex scenes, outperforming existing methods in accuracy.

Contribution

The study presents a new transformer-based architecture with serial and cross-attentional encoders tailored for RTD, along with a new dataset for evaluation.

Findings

01

Outperforms baseline methods in F1-score

02

Achieves higher mean IoU on the new dataset

03

Effective in detecting objects with complex shapes and orientations

Abstract

Rearranging objects (e.g. vase, door) back in their original positions is one of the most fundamental skills for domestic service robots (DSRs). In rearrangement tasks, it is crucial to detect the objects that need to be rearranged according to the goal and current states. In this study, we focus on Rearrangement Target Detection (RTD), where the model generates a change mask for objects that should be rearranged. Although many studies have been conducted in the field of Scene Change Detection (SCD), most SCD methods often fail to segment objects with complex shapes and fail to detect the change in the angle of objects that can be opened or closed. In this study, we propose a Co-Scale Cross-Attentional Transformer for RTD. We introduce the Serial Encoder which consists of a sequence of serial blocks and the Cross-Attentional Encoder which models the relationship between the goal and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

keio-smilab24/Co-Scale_Cross-Attentional_Transformer
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Image and Video Retrieval Techniques · Machine Learning in Bioinformatics · Domain Adaptation and Few-Shot Learning