Co-Scale Cross-Attentional Transformer for Rearrangement Target Detection
Haruka Matsuo, Shintaro Ishikawa, Komei Sugiura

TL;DR
This paper introduces a novel Co-Scale Cross-Attentional Transformer model for Rearrangement Target Detection, effectively identifying objects to be rearranged in complex scenes, outperforming existing methods in accuracy.
Contribution
The study presents a new transformer-based architecture with serial and cross-attentional encoders tailored for RTD, along with a new dataset for evaluation.
Findings
Outperforms baseline methods in F1-score
Achieves higher mean IoU on the new dataset
Effective in detecting objects with complex shapes and orientations
Abstract
Rearranging objects (e.g. vase, door) back in their original positions is one of the most fundamental skills for domestic service robots (DSRs). In rearrangement tasks, it is crucial to detect the objects that need to be rearranged according to the goal and current states. In this study, we focus on Rearrangement Target Detection (RTD), where the model generates a change mask for objects that should be rearranged. Although many studies have been conducted in the field of Scene Change Detection (SCD), most SCD methods often fail to segment objects with complex shapes and fail to detect the change in the angle of objects that can be opened or closed. In this study, we propose a Co-Scale Cross-Attentional Transformer for RTD. We introduce the Serial Encoder which consists of a sequence of serial blocks and the Cross-Attentional Encoder which models the relationship between the goal and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image and Video Retrieval Techniques · Machine Learning in Bioinformatics · Domain Adaptation and Few-Shot Learning
