Feature Pyramid Hashing

Yifan Yang; Libing Geng; Hanjiang Lai; Yan Pan; Jian Yin

arXiv:1904.02325·cs.CV·April 5, 2019

Feature Pyramid Hashing

Yifan Yang, Libing Geng, Hanjiang Lai, Yan Pan, Jian Yin

PDF

TL;DR

This paper introduces a two-pyramid hashing architecture that combines high-level semantic features with low-level details for improved fine-grained image retrieval, outperforming existing methods.

Contribution

It proposes a novel two-pyramid structure and a consensus fusion strategy to effectively capture both semantic and subtle appearance details in deep hashing.

Findings

01

Significant improvement over state-of-the-art on CUB-200-2011 dataset.

02

Effective capture of subtle differences enhances fine-grained retrieval.

03

Demonstrates the benefit of combining high and low-layer features.

Abstract

In recent years, deep-networks-based hashing has become a leading approach for large-scale image retrieval. Most deep hashing approaches use the high layer to extract the powerful semantic representations. However, these methods have limited ability for fine-grained image retrieval because the semantic features extracted from the high layer are difficult in capturing the subtle differences. To this end, we propose a novel two-pyramid hashing architecture to learn both the semantic information and the subtle appearance details for fine-grained image search. Inspired by the feature pyramids of convolutional neural network, a vertical pyramid is proposed to capture the high-layer features and a horizontal pyramid combines multiple low-layer features with structural information to capture the subtle differences. To fuse the low-level features, a novel combination strategy, called consensus…

Tables3

Table 1. Table 1. S t a g e 𝑆 𝑡 𝑎 𝑔 𝑒 Stage of ResNet18

stage	name in ResNet(He et al., 2016a)	output size	remarks
0	conv1	$112 \times 112$	$7 \times 7$ , 64, stride 2
1	conv2_x	$56 \times 56$	$3 \times 3$ max pool, stride 2
			$[\begin{matrix} 3 \times 3, 64 \\ 3 \times 3, 64 \end{matrix}]$ $\times$ 2


2	conv3_x	$28 \times 28$	$[\begin{matrix} 3 \times 3, 128 \\ 3 \times 3, 128 \end{matrix}]$ $\times$ 2


3	conv4_x	$14 \times 14$	$[\begin{matrix} 3 \times 3, 256 \\ 3 \times 3, 256 \end{matrix}]$ $\times$ 2


4	conv5_x	$7 \times 7$	$[\begin{matrix} 3 \times 3, 512 \\ 3 \times 3, 512 \end{matrix}]$ $\times$ 2

Table 2. Table 2. MAP of Hamming ranking w.r.t different number of bits on two fine-grained datasets.

Methods	CUB-200-2011				Stanford Dogs
Methods	16bits	32bits	48bits	64bits	16bits	32bits	48bits	64bits
Ours	0.5169	0.5832	0.6124	0.6233	0.6340	0.6909	0.7060	0.7130
DTH (Lai et al., 2015)	0.4641	0.5454	0.5771	0.5881	0.5435	0.6258	0.6362	0.6573
DSH (Liu et al., 2016)	0.3156	0.4930	0.5408	0.5967	0.4728	0.5587	0.6128	0.6319
HashNet (Cao et al., 2017)	0.3791	0.4628	0.4853	0.5123	0.4745	0.5521	0.5575	0.5934
DPSH (Li et al., 2016)	0.3497	0.4301	0.4908	0.5225	0.4270	0.5528	0.6080	0.6231
CCA-ITQ	0.1142	0.1580	0.1813	0.1986	0.2632	0.3681	0.4175	0.4402
MLH	0.0915	0.1289	0.1281	0.1983	0.2735	0.3531	0.3831	0.4084
ITQ	0.0637	0.0907	0.1048	0.1129	0.2023	0.2838	0.3123	0.3248
SH	0.0453	0.0595	0.0643	0.0686	0.1362	0.1628	0.1859	0.1832
LSH	0.0162	0.0234	0.0302	0.0340	0.0297	0.0517	0.0640	0.0850

Table 3. Table 3. Comparison with DaSH (Jin, 2018 ) of MAP on two fine-grained datasets.

Methods	Oxford Flower-17				Stanford Dogs
Methods	16bits	32bits	48bits	64bits	16bits	32bits	48bits	64bits
Ours	0.9542	0.9653	0.9691	0.9783	0.6224	0.6688	0.6924	0.6974
DaSH (Jin, 2018)	0.9225	0.9267	0.9692	0.9756	0.3976	0.5283	0.5950	0.6452

Equations26

a^{(s 4)} f^{(s 4)} = A v g p oo l (m^{(s 4)}), = F C (a^{(s 4)}),

a^{(s 4)} f^{(s 4)} = A v g p oo l (m^{(s 4)}), = F C (a^{(s 4)}),

v = S i g m o i d (f^{(s 4)}),

v = S i g m o i d (f^{(s 4)}),

= ℓ_{t r i} (v_{i}, v_{j}, v_{k}) max (0, m_{n} + ∣∣ v_{i} - v_{j} ∣ ∣_{2}^{2} - ∣∣ v_{i} - v_{k} ∣ ∣_{2}^{2}) s . t . v_{i}, v_{j}, v_{k} \in [0, 1]^{q},

= ℓ_{t r i} (v_{i}, v_{j}, v_{k}) max (0, m_{n} + ∣∣ v_{i} - v_{j} ∣ ∣_{2}^{2} - ∣∣ v_{i} - v_{k} ∣ ∣_{2}^{2}) s . t . v_{i}, v_{j}, v_{k} \in [0, 1]^{q},

a^{(s 3)} = A v g p oo l (m^{(s 3)}), f^{(s 3)} = F C (a^{(s 3)}),

a^{(s 3)} = A v g p oo l (m^{(s 3)}), f^{(s 3)} = F C (a^{(s 3)}),

a^{(s 2)} = A v g p oo l (m^{(s 2)}), f^{(s 2)} = F C (a^{(s 2)}),

a^{(s 2)} = A v g p oo l (m^{(s 2)}), f^{(s 2)} = F C (a^{(s 2)}),

f^{(M 1)} = A v g p oo l (f^{(s 2)}) + f^{(s 3)},

f^{(M 1)} = A v g p oo l (f^{(s 2)}) + f^{(s 3)},

f^{(M 2)} = A v g p oo l (f^{(M 1)}) + f^{(s 4)},

f^{(M 2)} = A v g p oo l (f^{(M 1)}) + f^{(s 4)},

v^{c} = S i g m o i d (f^{(M 2)}),

v^{c} = S i g m o i d (f^{(M 2)}),

= ℓ_{t r i} (v_{i}^{c}, v_{j}^{c}, v_{k}^{c}) max (0, m_{n} + ∣∣ v_{i}^{c} - v_{j}^{c} ∣ ∣_{2}^{2} - ∣∣ v_{i}^{c} - v_{k}^{c} ∣ ∣_{2}^{2}) s . t . v_{i}^{c}, v_{j}^{c}, v_{k}^{c} \in [0, 1]^{q},

= ℓ_{t r i} (v_{i}^{c}, v_{j}^{c}, v_{k}^{c}) max (0, m_{n} + ∣∣ v_{i}^{c} - v_{j}^{c} ∣ ∣_{2}^{2} - ∣∣ v_{i}^{c} - v_{k}^{c} ∣ ∣_{2}^{2}) s . t . v_{i}^{c}, v_{j}^{c}, v_{k}^{c} \in [0, 1]^{q},

ℓ_{co mb} = \frac{1}{M} i = 1 \sum M (ℓ_{t r i} (v_{i}, v_{j}, v_{k}) + ℓ_{t r i} (v_{i}^{c}, v_{j}^{c}, v_{k}^{c})),

ℓ_{co mb} = \frac{1}{M} i = 1 \sum M (ℓ_{t r i} (v_{i}, v_{j}, v_{k}) + ℓ_{t r i} (v_{i}^{c}, v_{j}^{c}, v_{k}^{c})),

b_{i} = {1, 0, v_{i}^{c} \geq 0.5 v_{i}^{c} < 0.5,

b_{i} = {1, 0, v_{i}^{c} \geq 0.5 v_{i}^{c} < 0.5,

A P_{i} = \frac{1}{N _{+}} k = 1 \sum n \frac{N _{+}^{k}}{k} \times p os (k)

A P_{i} = \frac{1}{N _{+}} k = 1 \sum n \frac{N _{+}^{k}}{k} \times p os (k)

M A P = \frac{1}{n _{q}} i = 1 \sum n_{q} A P_{i}

M A P = \frac{1}{n _{q}} i = 1 \sum n_{q} A P_{i}

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

Feature Pyramid Hashing

Yifan Yang

School of Data and Computer Science