Holi-DETR: Holistic Fashion Item Detection Leveraging Contextual Information

Youngchae Kwon; Jinyoung Choi; Injung Kim

arXiv:2512.23221·cs.CV·April 20, 2026

Holi-DETR: Holistic Fashion Item Detection Leveraging Contextual Information

Youngchae Kwon, Jinyoung Choi, Injung Kim

PDF

TL;DR

Holi-DETR is a novel detection transformer that holistically detects fashion items by leveraging contextual relationships such as co-occurrence, spatial arrangements, and body key-points, improving detection accuracy.

Contribution

It introduces a new architecture that integrates three types of contextual information into DETR for more accurate fashion item detection.

Findings

01

Improved average precision by 3.6 percentage points over vanilla DETR.

02

Enhanced detection performance by 1.1 percentage points over Co-DETR.

03

Effectively leverages contextual relationships to reduce ambiguities in fashion item detection.

Abstract

Fashion item detection is challenging due to the ambiguities introduced by the highly diverse appearances of fashion items and the similarities among item subcategories. To address this challenge, we propose a novel Holistic Detection Transformer (Holi-DETR) that detects fashion items in outfit images holistically, by leveraging contextual information. Fashion items often have meaningful relationships as they are combined to create specific styles. Unlike conventional detectors that detect each item independently, Holi-DETR detects multiple items while reducing ambiguities by leveraging three distinct types of contextual information: (1) the co-occurrence relationship between fashion items, (2) the relative position and size based on inter-item spatial arrangements, and (3) the spatial relationships between items and human body key-points. %Holi-DETR explicitly incorporates three types…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.