Image Transformation Sequence Retrieval with General Reinforcement Learning
Enrique Mas-Candela, Antonio R\'ios-Vila, Jorge Calvo-Zaragoza

TL;DR
This paper introduces a new task called Image Transformation Sequence Retrieval, proposing a reinforcement learning approach with MCTS that outperforms supervised methods in synthetic and real domains.
Contribution
The paper presents the first application of model-based reinforcement learning with MCTS to the ITSR task, demonstrating its effectiveness over supervised training.
Findings
MCTS-based models outperform supervised models in ITSR tasks.
The approach works well in both synthetic and real image domains.
ITSR presents unique challenges due to multiple correct sequences.
Abstract
In this work, the novel Image Transformation Sequence Retrieval (ITSR) task is presented, in which a model must retrieve the sequence of transformations between two given images that act as source and target, respectively. Given certain characteristics of the challenge such as the multiplicity of a correct sequence or the correlation between consecutive steps of the process, we propose a solution to ITSR using a general model-based Reinforcement Learning such as Monte Carlo Tree Search (MCTS), which is combined with a deep neural network. Our experiments provide a benchmark in both synthetic and real domains, where the proposed approach is compared with supervised training. The results report that a model trained with MCTS is able to outperform its supervised counterpart in both the simplest and the most complex cases. Our work draws interesting conclusions about the nature of ITSR and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image and Video Retrieval Techniques · Image Retrieval and Classification Techniques · Multimodal Machine Learning Applications
