ICAR: Image-based Complementary Auto Reasoning

Xijun Wang; Anqi Liang; Junbang Liang; Ming Lin; Yu Lou; Shan Yang

arXiv:2308.09119·cs.CV·August 21, 2023

ICAR: Image-based Complementary Auto Reasoning

Xijun Wang, Anqi Liang, Junbang Liang, Ming Lin, Yu Lou, Shan Yang

PDF

Open Access 1 Video

TL;DR

This paper introduces ICAR, a novel framework for scene-aware complementary item retrieval that leverages a flexible bidirectional transformer to understand visual compatibility and generate compatible items across domains.

Contribution

It proposes a category-aware transformer model that learns inter-object compatibility from large scene datasets in a self-supervised manner, improving retrieval performance.

Findings

01

Achieves up to 5.3% and 9.6% improvements in FITB scores on fashion and furniture.

02

Realizes 22.3% and 31.8% SFID improvements over state-of-the-art methods.

03

Introduces a generalizable cross-domain visual similarity embedding approach.

Abstract

Scene-aware Complementary Item Retrieval (CIR) is a challenging task which requires to generate a set of compatible items across domains. Due to the subjectivity, it is difficult to set up a rigorous standard for both data collection and learning objectives. To address this challenging task, we propose a visual compatibility concept, composed of similarity (resembling in color, geometry, texture, and etc.) and complementarity (different items like table vs chair completing a group). Based on this notion, we propose a compatibility learning framework, a category-aware Flexible Bidirectional Transformer (FBT), for visual "scene-based set compatibility reasoning" with the cross-domain visual similarity input and auto-regressive complementary item generation. We introduce a "Flexible Bidirectional Transformer (FBT)" consisting of an encoder with flexible masking, a category prediction arm,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

ICAR: Image-Based Complementary Auto Reasoning· underline

Taxonomy

TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Image Retrieval and Classification Techniques

MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Position-Wise Feed-Forward Layer · Byte Pair Encoding · Adam · Label Smoothing · Layer Normalization · Absolute Position Encodings · Residual Connection