Nested Hash Layer: A Plug-and-play Module for Multiple-length Hash Code Learning
Liyang He, Yuren Zhang, Rui Li, Zhenya Huang, Runze Wu, Enhong Chen

TL;DR
The paper introduces the Nested Hash Layer, a versatile module that generates multiple hash code lengths simultaneously, improving training efficiency and retrieval performance in deep supervised hashing for image retrieval.
Contribution
It proposes the Nested Hash Layer with a dynamic weighting strategy and cascade self-distillation to optimize multi-length hash code learning in a plug-and-play manner.
Findings
Training speed improved by 5 to 8 times.
Average performance increased by approximately 3.4%.
Effective multi-length hash code generation demonstrated.
Abstract
Deep supervised hashing is essential for efficient storage and search in large-scale image retrieval. Traditional deep supervised hashing models generate single-length hash codes, but this creates a trade-off between efficiency and effectiveness for different code lengths. To find the optimal length for a task, multiple models must be trained, increasing time and computation. Furthermore, relationships between hash codes of different lengths are often ignored. To address these issues, we propose the Nested Hash Layer (NHL), a plug-and-play module for deep supervised hashing models. NHL generates hash codes of multiple lengths simultaneously in a nested structure. To resolve optimization conflicts from multiple learning objectives, we introduce a dominance-aware dynamic weighting strategy to adjust gradients. Additionally, we propose a long-short cascade self-distillation method, where…
Peer Reviews
Decision·ICLR 2026 Conference Withdrawn Submission
The paper identifies a practical limitation in traditional deep supervised hashing—i.e., the inefficiency of training multiple single-length models to find an optimal hash code length—and targets it with a plug-and-play module (NHL), which aligns with the need for flexible, low-overhead solutions in large-scale image retrieval. The proposed long-short cascade self-distillation also addresses the understudied relationship between different-length hash codes, and the reported training speedup (5–8
The abstract provides no details on how the nested structure of NHL generates multiple-length codes or how the dominance-aware dynamic weighting strategy adjusts gradients. This lack of technical transparency makes the method unreproducible and unconvincing. The core idea of multi-length hash code learning is not novel, and the paper fails to articulate how NHL advances beyond these prior efforts. The 3.4% average performance gain is also modest and unsupported by analysis of when/why NHL outper
- NHL is plug-and-play and directly replaces the traditional hash layer, enabling multi-length code generation in one model without redesigning the backbone. - The paper formalises domination gradients over nested parameters and provides a closed-form dynamic weighting (Eqs. 6–7) to keep shorter-length objectives from being overwhelmed. That is to day, Eqs. (6)–(7) act as an analytical conflict regulator across multi-length objectives. They detect when gradients from different hash lengths star
- The dynamic re-weighting is explicitly computed only on NHL parameters, not the full network. The paper stated “we don’t consider the full network weights and focus on the parameter in NHL.” Consequently, while results show consistent improvements across architectures and datasets, and suggesting no practical instability upstream, but the analysis does not report backbone-level gradient diagnostics. Any claim of “resolving cross-length interference” should be scoped to the hash layer or be sup
1. It is interesting to generate hash codes with multiple lengths in a single model. 2. The proposed method shows good performance as shown in experments. 3. Dominance-Aware Dynamic Weighting, and Long-short Cascade Self-distillation are well motivated and reasonable.
1. The "PLUG-AND-PLAY MODULE" is overclaimed as the proposed module still need to trained with loss functions. 2. The proposed method does not address a more important issue, i.e., how to seek optimal code lengths for different tasks. 3. It might be necessary to compare against hashing code expansion and compression methods, as they also generate hash code with different lengths. 4. The efficiency comparison is not fair, i.e., compare the time to train one NHL model against the time to train fiv
1. Multi-length feature learning is an interesting topic especially for image hashing. 2. The idea of nested hash code is reasonable and the gradient constraint is deliberately designed according to the principle of short hashing alignment. 3. Plugging NHL into various deep hashing method will obtain performance gains. 4. The paper is organized-well and easy to follow.
1. My biggest concern for this work is about the technical contribution, which is limited. The design of gradient weighting is very intuitive and is hard to interpreted. Even though the probability of the anti-domination is high, it is hard to judge the influence is positive or negative on the hand-crafted weight manner as in Eq.(7). The self-distillation loss should be also discussed. The reason of only applying distillation between adjacent code number (from k+1 to k) is not clear. Some ideas
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsManufacturing Process and Optimization
MethodsFocus
